Towards Efficient Minimum Bayes Risk Decoding
Gerson Foks
Abstract:
Minimum Bayes risk (MBR) decoding for machine translation is getting renewed attention as it is a principled approach to neural machine translation that exhibits fewer pathologies than the widely adopted maximum a posteriori (MAP) decoding. Estimating the MBR objective, however, can be a costly endeavor. In this thesis, we explore two different approaches for estimating the MBR objective. Lastly, we try to construct the MBR translation by backpropagating through the utility function. The main approach is to predict the Bayes Risk with the help of neural models. With this approach, we train a model to regress to an accurate Monte Carlo (MC) estimate of the Bayes risk. At inference time we can directly estimate the Bayes risk with the trained model, circumventing expensive MC estimation. These models outperform the m-MC estimates in predicting the Bayes Risk for low values of m. Furthermore, they outperform m-MC estimates in terms of both computation speed and quality of the translation when the models are used as a decision rule. The second approach is to fit a mixture of Gaussians or students-t to the distributions of the utilities. This approach achieves similar results as simply regressing to the MC estimate. The last approach is to construct the MBR translation directly with the help of backpropagation. With this approach, we backpropagate through the (neural) utility function to find the translation that achieves the lowest Bayes risk. This approach however resulted in nonsensical translations. Our hope is that these three approaches give an insight into how we could get towards more efficient minimum Bayes risk decoding.