Relational-Realizational Parsing Reut Tsarfaty Abstract: Statistical parsing models aim to assign accurate syntactic analyses to natural language sentences based on the patterns and frequencies observed in human-annotated training data. State-of-the-art statistical parsers to date demonstrate excellent performance in parsing English, but when the same models are applied to languages different than English, they hardly ever obtain comparable results. The grammar of English is quite unusual in that it is fairly configurational. This means that the order of words inside sentences in English is relatively rigid and that the morphology of words is rather impoverished.   The main challenge associated with parsing languages that are less configurational than English, such as German, Arabic, Hebrew or Warlpiri, is the need to model and to statistically learn complex correspondence patterns between functions, i.e., sets of abstract grammatical relations, and their morphological and syntactic forms of realization.  This thesis proposes a new model, called the Relational-Realizational (RR) model, that can effectively cope with parsing languages that allow for flexible word-order patterns and rich morphological marking. The RR model is applied to parsing the Semitic language Modern Hebrew, obtaining signficant improvements over previously reported results. Whereas grammatical relations are largely universal, their realization is known to vary across languages. Different means of realization encompass the interaction of (at least) two typological dimensions, one associated with word order (Greenberg 1963), and another associated with word-level morphology (Sapir 1921, Greenberg 1954).  In order to adequately model complex form-function correspondence patterns that emerge from such interactions, we firstly consider morphological models that map grammatical properties of words to the surface formatives that realize them. In this work I adopt the principles of word-and-paradigm morphology (Anderson 1992, Stump 2001) and extend them to modeling correspondence patterns in the syntax. In the proposed RR model, constituents are organized into syntactic paradigms (Pike 1962, 1963). Each cell in a paradigm is associated with a Relational Network (Postal and Perlmutter 1977) and a set of properties that jointly define the function of the constituent. The form of a constituent emerges from the (i) internal grouping, (ii) linear ordering, and (iii) morphological marking of its subconstituents. The RR decomposition of the rules that spell out the form of constituents reflects different typological parameters, separating the functional, configurational and morphological dimensions. The dominated constituents may be associated with their own relational networks, and the process continues recursively until fully-specified morphosyntactic representations map to words. This 3-phased spell-out process gives rise to a recursive generative process that can be used as a probabilistic model and its parameters can be estimated from data. The resulting statistical model is empirically evaluated by parsing sentences in the Semitic language Modern Hebrew on the basis of a small annotated treebank (Sima'an et al 2001). Through a series of experiments we report significant improvements over the state-of-the-art Head-Driven (HD) alternative on various measures, without paying any computational costs. The typological characterization of the RR statistical distributions further suggests that the model may be useful for developing corpus-based quantitative methods for typological classification of natural language data. This thesis is organized as follows: Chapter 1: Linguistic Typology. This chapter introduces basic concepts in linguistic typology, and associates grammatical relations with the morphological and syntactic dimensions of realization. It further introduces the notion of noncongfigurationality in relation to the interplay between the two. Chapter 2: Parsing Technology. This chapter reviews generative and discriminative approaches that were applied to parsing English, and describes the application of existing generative models to Chinese, German and Arabic. The results suggest that less configurational languages are harder to parse. Chapter 3: The Data. This chapter describes the blend of configurational and nonconfigurational phenomena we find in the grammar of the Semitic language Modern Hebrew, and illustrates different instances in which morphological information enhances the interpretation of configurational structures. Chapter 4: The Model. This chapter describes the linguistic, formal, and computational properties of the Relational-Realizational model. It starts out with morphological modeling and extends the underlying principles to the syntactic domain. It formally defines the RR model as a generative rewrite rule-system and describes a probabilistic generative model based on it. Chapter 5: The Application. This chapter applies the RR model developed in chapter 4 to the Hebrew morphosyntactic phenomena described in chapter 3. The application illustrates the theoretical reach of the model, and it serves as the theoretical basis for implementing different treebank grammars. Chapter 6: Experiments. This chapter reports the results of parsing experiments for Modern Hebrew in the form of a head-to-head comparison of the RR model with the state-of-the-art HD approach. Chapter 7: Extensions. This chapter discusses potential extensions of the model towards handling related tasks including semantic modeling and morphological disambiguation. It finally suggests to study the potential application of the model for quantifying the information-theoretic content of the morphological and syntactic dimensions of realization for different languages.