Learning Syntactic Structure Yoav Seginer Abstract: Syntactic structure plays a central role in most theories of language, but it cannot be directly observed. The question this thesis investigates is whether there is a relation between syntactic structure and immediately observable properties of language, such as the statistics of the words and sentences that we hear and read. Finding such a relation has important consequences for the problem of language acquisition by children, as well as implications for the theory of syntax itself. It can also be used in engineering language processing systems. One way to tackle this question is to design an algorithm which collects statistics from sentences in a language and uses these statistics to determine the syntactic structure of these and other sentences in the language. This process learns the syntactic structure of a language from unannotated examples (plain sentences as can be observed in a language without extra information). The algorithm codes a relation between the statistics collected from the observable language and the syntactic structure. If the algorithm successfully captures at least part of the syntactic structure, we can say that the algorithm is an approximation of the relation between the observable language and its syntactic structure. Such an algorithm is called an unsupervised parser. This thesis describes a specific proposal for an unsupervised parsing algorithm. By testing the proposed parser on corpora of different languages, it is shown that the algorithm successfully discovers part of the syntactic structure of these languages. The relation described by the unsupervised parser depends not only on the choice of statistics being collected but also on the choice of representation of the syntactic structure. The correct choice of representation is therefore important in making the parser simple. The first part of the thesis describes a completely new representation of syntactic structure (called common cover links) and a parsing method suitable for this representation. The common cover links make it possible for the parser to make use of two important properties of natural language which I presuppose: humans process language incrementally and the syntactic structures of language are skewed (every subtree of a parse tree has a short branch). By using common cover links, it is possible to define an incremental parser which constructs the syntactic structure of a sentence as the words of the sentence are being read one by one. The common cover link representation ensures that only skewed syntactic trees are produced by the parser. This considerably reduces the number of possibilities that the parser has to consider and makes the parser simple and fast. The second part of the thesis describes the statistics that the parser collects and how these are used during parsing. An important property of the algorithm is that statistics are collected from each sentence only after this sentence has been parsed. In this way, the parser is gradually improved: new statistics are added to previously collected statistics and together are used to parse the next sentence and collect additional statistics. The statistics are collected for each word separately. For each word, the statistics are represented by labels which are based on the frequency of adjacent words and of the labels of these words. Based on the labels, simple properties are induced which determine how words should be linked to each other when parsing. As a result of the Zipfian distribution of words in natural languages, the most frequent words dominate the properties of all words. In this way, the most frequent labels take over the role of the traditional parts of speech. The induction of the properties of a word is carried out by summing over the properties of other words. This makes the learning process very simple and fast. The parser is tested on three corpora, in English, German and Chinese. In each of these three languages the parser successfully discovers part of the syntactic structure of the language. The parser is much faster than other unsupervised parsers while achieving state of the art accuracy.