Learning the Latent Structure of Translation Markos Mylonakis Abstract: This dissertation discusses methods to learn the latent structural patterns that underlie translation data. It explores different approaches to modelling bilingual structure and presents novel frameworks and algorithms, such as Cross-Validated Expectation-Maximization (CV-EM), to learn phrase-based, hierarchical and syntax-driven Statistical Machine Translation (SMT) models from data. In this thesis, we present methods to automatically learn phrase-based Statistical Machine Translation models that assume a latent bilingual structure as their central modelling variable. Acknowledging that each language is strongly characterised by its individual structural properties, we aim to learn a bilingual structure that augments and supersedes its monolingual counterparts, to bridge the gap between them by explaining the transformations taking place when conveying meaning across languages. The learning frameworks and algorithms we present allow us to discover these structural patterns in bilingual data and automatically learn models that take them into account to better translate. We apply our methodology for a sequence of statistical translation models of increasing complexity. This leads us to the presentation of a well-founded learning framework for hierarchical, syntactically motivated models that explain the translation process by taking advantage of the linguistic structure of language. Chapter 1 offers an introduction to the context and aims of this work. It introduces the key aspects related to modelling translation structure and discusses the impact of its latent nature, as well as the challenges involved in learning to identify it in bilingual data. In Chapter 2, we start by examining some of the modelling frameworks that have been influential on SMT research, such as word-based, phrase-based and hierarchical SMT. We then discuss the EM algorithm and Cross-Validation, the two theoretical pillars under the novel learning algorithm we introduce in the chapter that follows. Chapter 3 examines the challenges related to learning phrase-based translation models, by considering the wider problem of learning Fragment Models: models which describe how to build new data instances by combining together data fragments extracted from a training dataset. We then introduce the Cross-Validated Expectation-Maximization (CV-EM) algorithm, a novel learning algorithm for Fragment Models which optimises parameters according to a Cross-Validated Maximum Likelihood Estimation (CV-MLE) objective. The next three chapters describe and empirically evaluate learning frameworks with CV-EM at their core, for three distinct, state-of-the-art SMT models. Chapter 4 contributes a well-founded method to learn the conditional translation probabilities of Phrase-Based SMT models employing contiguous phrase-pairs, centred around disambiguating the latent segmentation of sentence-pairs into phrase-pairs. This method is shown empirically to perform at least as well as the heuristic, ad hoc estimators that are typically used for these models. In Chapter 5, we consider the additional challenges involved in modelling translation with a synchronous grammar, and successfully learn a relatively simple hierarchical translation model which offers comparable performance with a highly competitive baseline. Chapter 6 moves considerably further, to build around CV-EM a learning framework that allows learning complex hierarchical translation models that take advantage of external annotations of source and/or target sentences. We deploy this framework to contribute a method to learn linguistically motivated hierarchical translation models, by identifying the source-language linguistic patterns which are informative for translation. We subsequently show how our approach delivers tangible translation improvements across four distinct language pairs. The results of Chapter 6 complete those of Chapters 4 and 5, to provide considerable evidence to back the key hypothesis of this thesis: models assuming a latent translation structure can be learnt under a clear learning objective, as implemented in terms of a well-understood optimisation framework and learning algorithm. The learnt models are able to provide real-world, competitive translation performance in comparison to heuristic training regimes, rendering the use of the latter unnecessary. Our methodology not only provides a reliable and effective substitute for these heuristic estimators, but most importantly lays a path to the future, by making possible the estimation of powerful translation models that uncover the latent side of translation, and whose estimation under ad hoc algorithms would have been hardly possible.