20 March 2018, PhD defense, Joachim Daiber
Machine translation systems often incorporate modeling assumptions motivated by properties of the language pairs they initially target. When such systems are applied to language families with considerably different properties, translation quality can deteriorate. Phrase-based machine translation systems, for instance, are ill-equipped to handle the challenges caused by relaxed word order constraints and productive word formation processes in morphologically rich languages. In this thesis, we ask what role the properties of languages, as studied in the field of linguistic typology, play in how well machine translation systems perform. We focus in particular on word order and morphology, and show that typological differences in these areas can be bridged by making certain linguistic phenomena overt to the translation system. Understanding and exploiting typological differences between languages enables improvements to the typological robustness of translation systems without significantly changing the assumptions of the underlying translation models. In the area of word order, we examine the influence of word order freedom on preordering, a popular technique to model word order in phrase-based machine translation, and propose a method to improve its typological robustness. For morphological complexity, we show that reducing the dissimilarity between the source and target language improves phrase-based machine translation for typologically diverse language pairs. Finally, we show that besides helping to bridge the performance gaps between typologically diverse languages, linguistic typology can also serve as a source of knowledge to guide reordering models and to facilitate universal reordering models applicable to multiple target languages.
The full thesis can be found here: https://pure.uva.nl/ws/files/22104792/Thesis.pdf