Refining translation grammars through paraphrase clustering Ekaterina Garmash Abstract: Finding the right model for the structure of translation equivalence between languages is one of the major challenges and lines of research in statistical machine translation. In this thesis we consider a formalization of translation equivalence as synchronous grammars and explore a particular way of modifying a translation grammar - by labeling its nonterminal symbols. The labeling we develop is based on the general notion of semantic equivalence: since it is not known *a priori* what kind of semantic distinctions are relevant to translation equivalence, we define an unsupervised procedure to learn a label set by clustering close paraphrases that somehow characterize strings generated from a given nonterminal symbol. We implement the defined procedure and test a current baseline grammar (Hiero system) labeled with a generated label set. By trying out a number of labeling algorithms and introducing additional modifications to the grammar, as well as making other changes to the standard translation pipeline, we find that the performance of the labeled grammar is worse than the one of the unlabeled. We discuss possible reasons for that and propose a number of modifications to the labeling procedure we defined and implemented here that could improve the performance.