2011

Nearly complete list of past talks (upto summer 2010)




TUESDAY, June 29, at 16.00 in A1.10
Stefan Frank (ILLC)
Insensitivity of the human sentence-processing system to hierarchical structure
(with Rens Bod)

Although it is generally accepted that hierarchical phrase structures are instrumental in describing human language, their cognitive status is still debated. We investigated the role of hierarchical structure in sentence processing by implementing a range of probabilistic language models, some of which depend on  hierarchical structure whereas others use sequential structure only. All models estimated the occurrence probabilities of syntactic categories in sentences for which reading-time data was available. Relating the models' probability estimates to the data showed that the hierarchical-structure models do not account for reading times over and above the sequential-structure models. This suggests that a sentence's hierarchical structure, unlike many other sources of information, does not affect the generation of expectations about upcoming words.


Wednesday, July 7, at 16.00 in A1.10
Federico Sangati (ILLC)
A probabilistic generative model for an intermediate constituency-dependency representation

We present a probabilistic model extension to the Tesnière Dependency  Structure (TDS) framework formulated in (Sangati and Mazza, 2009). This rrepresentation incorporates aspects from both constituency and dependency theory. In addition, it makes use of junction structures to handle coordination constructions. We test our model on parsing the English Penn WSJ treebank using a re-ranking framework. This technique allows us to efficiently test our model without needing a specialized parser, and to use the standard evaluation metric on the original Phrase Structure version of the treebank. We obtain encouraging results: we achieve a small improvement over state-of-the-art results when reranking a small number of candidate structures, on all the evaluation metrics except for chunking.

Henk Zeevat (ILLC)
Wednesday, June 16. 2010
Syntactic Paradigms
(with Alessandro lo Popolo)

Paradigms are the bread and butter of traditional linguistics and it is surprising that they do not have a more central place in modern linguistics. Arguably they are central for semantic change, for the grammaticalisation of originally semantic features and for the elusive semantics of tense morphemes, prepositions and modal particles. Yet, they are hard to identify within the modern proposals for formal grammars.

Morphological paradigms as conceived by Anderson and Blevins are naturally analysed within optimality theory as the set of expressive constraints that apply to a certain lexical category. They state which abstract features must be expressed by the form of the word. This sets up the abstract paradigm: the set of cells for which the paradigm provides specific forms. The specific forms can also be described in OT: the combination of a cell and a word can be taken as the input in a morphological competition. 

This interpretation of paradigms automatically generalises from morphology to syntax. Complex categories like NP, S and VERB obligatorily express certain features like definiteness, wh in the case of NP, finiteness, tense, aspect, modality, evidentiality, additivity in the case of S, and a similar range for verb. What features and the precise nature of the features is language dependent. The obligatory expresion is most simply described by expressive constraints restricted to the category.

What is different however is the way in which the cells are defined. A complex category can satisfy an expressive constraint in three ways: by creating a formal property that expresses the feature (word order), by building a constituent that expresses the feature and by having a constituent that expresses the feature.

As an illustration, the talk will try to identify negation paradigms in Russian, Italian, French, English and Dutch and provide a uniform treatment of negation for these languages. We believe notion of syntactic paradigms can be useful for semantics, syntax, computation and for understanding the historical processes underlying the formation of languages.



Willem Zuidema (ILLC)
Wednesday, June 9, 2010
Empirical evidence for recursive hierarchical structure in child language

There is general agreement that adult language shows recursive, hierarchical phrase-structure (rhps), but many questions remain about how abundant recursion really is in language and about how children arrive at it. One hypothesis is that rhps is an innate, formal universal of natural language, but an equally popular idea is that it is the emergent result from children building generalisation on generalisation, and need not even be universal. Researchers trying to empirically establish the moment "recursion" enters a child's language, or more generally count the occurrences of recursion, face a difficult methodological challenge: the usual introspective evidence from the theoretical linguist is unavailable for such questions, and behavioral data does not unambiguously reveal whether a recursive operation or simple "caching" was used to generate or analyze utterances that look like adult recursions. In this talk, I will argue that neither a priori arguments, nor stopgap empirical data will suffice to resolve this debate; rather, a careful analysis using techniques from statistical machine learning and formal grammars is called for.

Even if a a gold standard phrase-structure annotation is given, it is not obvious how to count occurrences of recursion. Of course, from such data we can easily extract a dominance matrix, which records for every pair of syntactic categories X and Y, how often X dominates Y. But I will show that simply counting how often one finds a node dominated by a node of the same category, can both over- and underestimate the frequency of "real" recursion. Finally, I will present an alternative measure of "recursiveness" based on the deviations from a linear ordering of syntactic categories w.r.t. dominance, and show how data from Childes show a steady increase with age of the child on this metric.


Gideon Borensztajn (ILLC)
Tuesday, May 25, 2010
Pointers in the brain: What the systematicity of language tells about cortical connectivity and connectionism


Afra Alishahi (University of Saarland, Saarbrücken)
Wednesday, May 19, 2010
A Bayesian account of the acquisition of abstract argument structure constructions

Developing computational algorithms that capture the complex structure of natural languages is an open problem. In particular, learning the abstract properties of language only from usage data without built-in knowledge of language structure remains a challenge. We have developed a Bayesian model of the acquisition and use of verb argument structure from child-directed data. In our model, the general constructions of language (such as transitive and intransitive) are viewed as a probability distribution over the syntactic and semantic features, e.g., the semantic properties of the verb and its arguments, and their relative word order in an utterance. Constructions are learned through clustering similar verb usages. Language use, on the other hand, is modeled as a Bayesian prediction problem, where the missing features in a usage are predicted based on the available parts and the acquired constructions (e.g., in sentence production, the best syntactic pattern for an utterance is predicted from the available semantic information). The model can successfully learn the common constructions of language, and its behaviour shows similarities to actual child data, both in sentence production and comprehension. Moreover, the acquired knowledge of language in this model is robust yet flexible, and many general patterns of behaviour that are observed in children can be simulated and explained by this approach.


Raquel Fernández Rovira (ILLC)
Wednesday, April 21, 2010
Incrementality and Relative Gradable Properties

There is a growing amount of psycholinguistic evidence showing that humans process and produce language incrementally, i.e. bit by bit before utterance or constituent boundaries are reached, and making use of different sources of information---speech, syntactic structure, semantic interpretation, pragmatic knowledge---that interact in parallel. In recent years, (computational) linguists have started to develop theories and systems that are more consistent with these psycholinguistic findings---a trend very much noticeable in recent work on spoken dialogue systems. In this talk, reporting on work in progress, I will discuss a number of questions that embracing incrementality raises for the interpretation of relative gradable adjectives in referential descriptions, as well as for the semantics/pragmatics interface at large.


Gerold Schneider (University of Zurich)
Wednesday, April 7, 2010
Parsing with Dependency Grammar: combining hand-written rules and corpus statistics

In this talk I will present the fast, robust, large-scale dependency parser Pro3Gres, which combines hand-written rules and corpus statistics.
I will focus on the following topics:
  1. Dependency Grammar (DG): Differences between Lucien Tesniere's DG and currently used popular DGs, including the one of Pro3Gres. X-bar and DG.
  2. Hand-written rules and constraints in combination with corpus data. Search space reductions, statistical models, domain adaptations.
  3. DG as an f-structure only version of Lexical-Functional Grammar (LFG).
  4. Overview of applications of the parser: information retrieval in the biomedical domain, descriptive corpus linguistics.

Workshop Theory, Typology & Technology: Parsing in the face of diversity
Tuesday, March 23, 2010

Abstract and program

Confirmed Speakers:


Hartmut Fitz (University of  Groningen)
Wednesday, March 10, 2010
Statistical learning of complex questions


The problem of auxiliary fronting in complex polar questions occupies a prominent position within the nature versus nurture debate in language acquisition. Usage-based theories of language need to explain how the syntax of these questions can be acquired from experience. First, I survey several data-driven models which have recently attempted to address this issue. Then I present a linear statistical learner, called the adjacency-prominence learner, which uses sequential and semantic information to produce utterances from a bag of words. I show that this model is capable of generating grammatical complex questions (i) without explicitly representing hierarchical phrase structure and (ii) without exposure to the target utterances in its training environment. Implications for nativist theories of language acquisition are discussed.


Karl Magnus Petersson (Max Planck Institute for Psycholinguistics)
Wednesday, February 10, 2010
The Neurobiology of Syntax: Recursion and Dynamical Systems

The language faculty is a neurobiological system that provides humans with the capacity to understand, for all practical purposes, an unlimited set of sentences. At least since von Humboldt (1836), theoretical linguists have interpreted ''practically unlimited'' as meaning infinite and it is argued that this ''linguistic infinity'' necessitates a recursive syntactic rule set – a knowledge structure that, for example, allows humans to embed and understand phrases within phrases without limit. Hornstein (2009, A Theory of Syntax: Minimal Operations & Universal Grammar) argue that the language faculty must be simpler than previously thought. This is crucial, because ultimately, whatever syntactic operations linguists propose, these must be implementable in neural processing infrastructure. Thus, neurobiology puts hard constraints on the properties of the language faculty. We argue that the finiteness of neural systems, in terms of memory capacity and processing precision, is such a constraint. What are the implications of this for neurobiological models of syntax? First, we argue that it is not meaningful to separate ''syntactic computation'' from ''processing memory'' - or competence from performance in linguistic terms - and we note that the concept of a recursive rule set does not need to be motivated by an ideal ''linguistic infinity''. Instead, the relevant fact is the human capacity to process bounded patterns of non-adjacent dependencies in language – there is a definite upper-bound on ''distance'' set by neurobiology. Second, we are free to choose any syntactic framework we prefer as long as this serves its purpose - for example, we may choose to capture non-adjacent dependency processing in bounded recursive formalisms. However, because neural processing systems belong to the class of adaptive stochastic dynamical systems with fading memory properties, exemplified by noisy spiking network processors, it seems more natural to try to understand syntax processing in terms of this type of systems. We illustrate this theoretical discussion with empirical results from behavioral, TMS, and FMRI investigations of Broca’s region in the context of implicit acquisition of simple artificial unification grammars as well as some insights from computational modeling.


Vera Demberg (University of Edinburgh)
Wednesday, February 3, 2010
A Broad-Coverage Model of Prediction in Human Sentence Processing

The aim of my recent research is to design and implement a cognitively plausible theory of sentence processing which contains a mechanism for modelling prediction and verification as processes in language understanding. Modelling prediction is an interesting and relevant problem because recent experimental evidence suggests that humans predict upcoming structure or lexemes during sentence processing. However, none of the current sentence processing theories model prediction explicitly.
In my talk, I will explain the mechanisms in my sentence processing theory, as well as the requirements to a parser that observes the strict incrementality requirement stated in the processing theory. The linguistic formalism I use to describe the incremental derivations is a modified version of tree-adjoining grammar, called PLTAG for "PsychoLinguistically motivated TAG". In the last part of my talk, I will show that the sentence processing theory using the parser successfully models psycholinguistic sentence processing phenomena.


Frans Adriaans (Utrecht University)
Wednesday, January 27, 2010
StaGe: A Model for the Induction of Phonotactic Constraints from Continuous Speech

Infants are faced with the challenge of building up a lexicon of discrete, word-sized units from continuous speech input. In this talk, I will present a computational model for the induction of phonotactic constraints from continuous speech. Such constraints provide language learners with a valuable cue for the detection of word boundaries in the speech stream (McQueen, 1998). StaGe (Adriaans & Kager, in press) implements two learning mechanisms that have been shown to be available to infants: statistical learning (e.g., Saffran, Aslin, & Newport, 1996) and generalization (e.g., Saffran & Thiessen, 2003). While current speech segmentation models typically rely on statistical information (e.g. transitional probabilities), we show that feature-based generalization over statistically learned biphone constraints improves the speech segmentation performance of the learner. These results indicate a potential role for phonotactic generalizations in human speech segmentation.

References
Adriaans, F., & Kager, R. (in press). Adding generalization to statistical learning: The induction of phonotactics from continuous speech. Journal of Memory and Language, doi:10.1016/j.jml.2009.11.007
McQueen, J. M. (1998). Segmentation of continuous speech using phonotactics. Journal of Memory and Language, 39, 21–46.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. 
Saffran, J. R., & Thiessen, E. D. (2003). Pattern induction by infant language learners. Developmental Psychology, 39, 484–494.


Ingrid Nieuwenhuis (Radboud University Nijmegen)
Wednesday, January 20, 2010
Sleep enhances the implicit extraction of grammar rules

Grammar learning requires the extraction of complex rules implicitly from experience. We exposed participants to letter sequences generated from an artificial (Reber) grammar and tested their ability to classify new sequences as grammatical or not. We show that sleeping between exposure and testing specifically enhances the acquisition of the grammatical structure, as opposed to effects caused by familiarity with local sub-sequences. These results suggest that an active rule extraction process takes place during sleep.


Stefan Frank (ILLC)
Wednesday, December 2, 2009
Investigating the roles of expectation and uncertainty in human sentence processing

It is well established by now that the time needed to read a word in sentence context is positively correlated to its surprisal, as can be estimated by probabilistic language models. A word's surprisal has an intuitive interpretation as the extent to which the word came unexpected. However, other probabilistic measures of processing effort have also been suggested. For example, Roark et al. (2009) showed that uncertainty about the upcoming word (formalized as the entropy of the next-word probability distribution) accounts for word-reading time, over and above the words' surprisal. In contrast, Hale (2003, 2006) argued that it is the reduction in entropy, resulting from processing a word, which is indicative of processing effort. Although he provided several examples of how entropy reduction can explain particular psycholinguistic phenomena, the relation between entropy reduction and reading time has not previously been tested on a scale that allows for a statistical test of the entropy-reduction hypothesis. I'll present some new data showing that entropy reduction is indeed predictive of word-reading times, over and above surprisal, and that (contrary to Roark et al.) entropy itself has no additional explanatory value. However, the psychological interpretation of entropy reduction remains unclear.


Federico Sangati (ILLC)
Wednesday, November 25, 2009
An English Dependency Treebank à la Tesnière

During the last decade, the Computational Linguistics community has shown an increased interest in Dependency Treebanks. Several groups have developed new annotated corpora using dependency representation, while other people have proposed several automatic conversion algorithms to transform available Phrase Structure (PS) treebanks into Dependency Structure (DS) notation. Such projects typically refer to Tesnière as the father of dependency syntax, but little attempt has been made to explain how the chosen representation relates to the original work. A careful comparison reveals substantial differences: modern DS annotations discard some relevant features characterizing Tesnière’s model. This paper is presenting our attempt to go back to the roots of dependency theory, and show how it is possible to transform a PS English treebank to a DS notation that is closer to the one proposed by Tesnière, which we will refer to as TDS. We will show how this representation can incorporate all main advantages of modern DS, while avoiding well known problems concerning the choice of heads, and better representing common linguistic phenomena such as coordination.
http://staff.science.uva.nl/~fsangati/TDS/TLT8_Sangati_Mazza.pdf


Tejaswini Deoskar (ILLC)
Wednesday, September 30, 2009
Smoothing fine-grained PCFG lexicons
We present an approach for smoothing treebank-PCFG lexicons with lexical information obtained from a large unannotated corpus, by interpolation of treebank lexical parameters with estimates obtained from unannotated data via the inside-outside algorithm. The PCFG has complex lexical categories, making relative-frequency estimates from a treebank very sparse. This kind of smoothing for complex lexical categories results in improved parsing performance, with a particular advantage in identifying obligatory arguments subcategorized by verbs unseen in the treebank.


Federico Sangati (ILLC)
Wednesday, September 30, 2009
A generative re-ranking model for dependency parsing
We propose a framework for dependency parsing based on a combination of discriminative and generative models. We use a discriminative model to obtain a k-best list of candidate parses, and subsequently rerank those candidates using a generative model. We show how this approach allows us to evaluate a variety of generative models, without needing different parser implementations. Moreover, we present empirical results that show a small improvement over state-of-the-art dependency parsing of English sentences.


Maarten Versteegh (Radboud University Nijmegen)
Wednesday, September 23, 2009
Using Data-Oriented Parsing to model syntactic change

A number of properties of the data-oriented parsing model have also been identified as being important in grammaticalization theories of language change. The exemplar-based nature, the use of probabilities and the incorporation of constructions are shared by both approaches. Building on data-oriented parsing, we present a computational model of syntactic change that can account for the major mechanisms of language change. We show the model's plausibility by simulating several historical situations of change.


Gerard Kempen (Max Planck Institute for Psycholinguistics, Nijmegen, and Leiden University)
Wednesday, September 2, 2009
The Unification Space implemented as a localist neural net: Predictions and error-tolerance in a constraint-based parser
Joint work with Theo Vosse

We introduce a novel computer implementation of the Unification-Space parser (Vosse & Kempen 2000) in the form of a localist neural network whose dynamics is based on interactive activation and inhibition. The wiring of the network is determined by Performance Grammar (Kempen & Harbusch 2003), a lexicalist formalism with feature unification as binding operation. While the network is processing input word strings incrementally, the evolving shape of parse trees is represented in the form of changing patterns of activation in nodes that code for syntactic properties of words and phrases, and for the grammatical functions they fulfill. The system is capable, at least in a qualitative and rudimentary sense, of simulating several important dynamic aspects of human syntactic parsing, including garden-path phenomena and reanalysis, effects of complexity (various types of clause embeddings), fault-tolerance in case of unification failures and unknown words, and predictive (expectation-based) parsing. English is the target language of the parser described, and a demonstration version of the software is available form the authors via the internet.


Tim O'Donnell (Harvard University)
Monday, August 10, 2009
Computation and Reuse in Language

Productivity in language is made possible by a division of labor between computation and storage: stored lexical items are combined via computation into more complex structures. A central question for theories of language is what constitutes this inventory of stored items: Where do the stored items come from? Under what conditions does storage happen? How are storage and computation integrated? I will present a Bayesian framework designed to study these questions, along with some preliminary empirical evaluation


Reut Tsarfaty (ILLC)
Monday, July 27, 2009
Parsing a (relatively) free word-order language


Peter beim Graben (University of Reading)
Monday, June 29, 2009
Dynamic Cognitive Modeling of Syntactic Language Processing
Joint work with Roland Potthast

I will present Dynamic Cognitive Modeling [1] as a three tier top-down approach comprising the levels of (1) cognitive processes; (2) their state space representations; and (3) dynamical systems implementations that are guided by neuroscientific principles. These levels are passed through in a top-down fashion: (1) cognitive processes are described as algorithms sequentially operating on complex symbolic data structures that are decomposed using so-called filler/role bindings [1,2]; (2) data structures are mapped onto points in abstract vector spaces using tensor product representations [1,2]; (3) cognitive operations are implemented as dynamics of neural networks or neural/dynamic fields. The last step involves the solution of inverse problems, namely training the system's parameters to reproduce prescribed trajectories of cognitive operations in representation space. I present a regularization technique for the common Hebb rule, called Tikhonov-Hebbian learning, in order to tackle the ill-posedness of the inverse problem [1]. The method is illustrated by means of an instructive example from syntactic language processing [3]. I construct a functional representation [4] of a context-free left-corner parser over a three-dimensional feature space processing the well-formed sentence

(1) Die Gans wurde im Ofen gebraten
“The goose was grilled in the oven”

and the phrase structure violation

(2) Die Gans wurde im gebraten
“The goose was grilled in”.

After training a neural field through Tikhonov-Hebbian learning, the differences of neural activation exhibit some remarkable resemblance with the event-related brain potentials reported in [3].

References
[1] beim Graben, P. & Potthast, R. (2009). Inverse problems in dynamic cognitive modeling. Chaos: An Interdisciplinary Journal of Nonlinear Science, 19, 015103.
[2] Smolensky, P. & Legendre, G. (2006). The Harmonic Mind. From Neural Computation to Optimality-Theoretic Grammar, MIT Press.
[3] Hahne, A. & Friederici, A. D. (1999). Electrophysiological evidence for two steps in syntactic analysis: Early automatic and late controlled processes. Journal of Cognitive Neuroscience, 11, 194 – 205.
[4] beim Graben, P., Pinotsis, D., Saddy, D. & Potthast, R. (2008). Language processing with dynamic fields. Cognitive Neurodynamics, 2(2), 79 – 88.


Henk Zeevat (ILLC)
Wednesday, June 24, 2009
Bayesian Interpretation

Click here for the abstract

Amit Mukerjee (Indian Institute of Technology, Kanpur)
Thursday, June 18, 2009
The constructivist enterprise: towards computational language acquisition
Slides (pdf)

The constructivist approach to language is in contrast with the generativist model, which believed in the autonomy of syntax. Most models of computational language derive from generativist ideas, treating its units as formal, empty symbols - semantics is defined (or derived) as a secondary process. In the constructivist model, symbols are thought of as a tight coupling between a phonological pole and a diffuse set of associations that constitute its semantic pole. Grammar is viewed as the binding of such units into larger bipolar structures.
Further, while humans master the complexities of mental processes and language gradually, computational structures tended to focus on the finished adult language. This work is premised on the belief that in order to build constructivist models of language, we must start from an infant-like state, and trace the growth of the semantic-syntactic grammar simultaneously, eventually leading to adult-like performance. Thus, in a sense, this approach attempts to implement a version of the strongly physicalist Aufbau program a la Carnap.
We present some very preliminary work towards this goal, where we expose the system to schematic and real videos of events. Using a bottom-up model of attention, we parse these visual streams into a very natural set of features, and use completely unsupervised approaches to categorize the agents, actions, and relations in the video. The oracles that categorize a scene into one structure or the other may be thought of as "image schemas", adopting a term from developmental psychology. Since they provide a mapping from perception, the set of image schemas may be viewed as a form of content-based index. Unlike the predicates of binary logic, image schemas provide graded membership, which permits a degree of defeasibility.
Subsequently, when the system is exposed to linguistic commentaries of the same scenes, we show that it can immediately associate many of these image-schemas with linguistic units in the text. Furthermore, structures such as the number of subcategories of verbs (valency) are shown to be derivable from the semantics directly. Though this is not demonstrated here, once the frame structures associated with the clausal head is known, it may be assumed that syntactical patterns implementing this at the phonological level (syntax) may be learned relatively easily.
This is of course a very preliminary attempt, and much work remains, especially in the scaling up from this infant-like state to larger and more abstract situations. However, the promise of extremely adaptive interaction, defeasibility, inverse indexing with its possibility of generation, and the unsupervised nature of the process is is likely to give the constructivist enterprise a significant push in years to come.


Miles Osborne (University of Edinburgh)
Wednesday, June 10, 2009
Stream-based randomised language modelling for machine translation
Joint work with Abbey Levenberg

We now live in a world where we have more data than we can possibly use. For example, consider all of the newswire that is published on the Web each day, seven days a week, fifty-two weeks a year. For machine translation this abundance of data means we can produce more fluent translations. However, traditional language models built using gigantic amounts of data can easily exceed the capacities of our machines. Quite simply, we have more data than we can easily use.
Recent advances in randomised representations (Bloom Filters and variants) allow us to represent a lot of information in small space. These techniques are probabilistic and make errors at a given rate. Applied to language modelling, this means we can use more of the available data than ever before. I shall explain how randomised techniques can be used to build what are probably the largest language models outside of companies such as Google. This allows us to answer questions such as does using a trillion words of text produce better translation results?
One problem with current randomised LMs is that being batch-based they are unsuitable for modelling an unbounded stream of language whilst maintaining a constant error rate. I shall also present a novel randomised language model which is online and allows for modelling of an unbounded text stream. Translation experiments over a text stream show that our online randomised model matches the performance of batch-based randomised LMs without incurring the computational overhead associated with full retraining. This opens up the possibility of randomised language models which continuously adapt to the massive volumes of texts published on the Web each day.

Minisymposium on language evolution
Monday, April 20, 2009
Kenny Smith (Northumbria University, Newcastle) and Dan Dediu (MPI for Psycholinguistics, Nijmegen)

Kenny Smith
Language change and language evolution in the laboratory

Language is culturally transmitted: the language we speak is, at least in part, determined by the language we hear others produce. This means that language is an evolutionary system in its own right. I will present an experimental paradigm for studying the cultural evolution of language, and describe a series of experiments involving the iterated learning of artificial languages by human participants. In the first part of the talk I'll focus on a simple iterated learning experiment (developed in conjunction with Elizabeth Wonnacott, University of Oxford) which can be used to explore regularization and the elimination of unpredictable variation. In the second part of the talk I'll present further iterated learning experiments (carried out jointly with Simon Kirby and Hannah Cornish, University of Edinburgh) which show that basic structural properties of language can emerge through similar processes. Previous computational and mathematical models suggest that iterated learning provides an explanation for the structure of human language: our experimental work shows that the predictions of these models, and models of cultural evolution more generally, can be tested in the laboratory.

Dan Dediu
Genetic biases and language change: how well do simple models of language evolution generalize?

A proper account of language change and evolution involves at a fundamental level the understanding of the complex relationship between genetic biases, individual learning and cultural transmission. At one extreme there are some massively nativist accounts in the form of an Universal Grammar (UG) which would explain at the same time both the universals of language and its range of variation. At the other extreme, there are accounts which see language as an adaptive system evolving within the constraints of a fairly generic cognitive system (and brain). In this talk I will try to cover two apparently distinct themes, but which are nevertheless intimately connected. First, I will present a brief overview of the apparent relationship between the distribution of tone languages and the derived haplogroups of ASPM and Microcephalin, which rises the question of a genetic biasing of language transmission in such a way that the spatial patterning of this bias influences the spatial patterning of linguistic tone. This suggestion, in turn, rises a host of questions concerning the nature of this bias and its mechanisms. To this end, I tried to use the promising new Bayesian Iterated Learning Model (BILM) paradigm but with surprising results. I will present simulation data showing that the (now classic) results obtained in homogeneous chains of single Bayesian agents are not robust to changes in the social parameters of the model. More precisely, they break down even for heterogeneous chains of two agents as well as for more complex populations. This suggests that such simple models, even if mathematically and computationally very elegant and powerful, might not in fact tell us much about language change and evolution, but require a careful study of their properties and a healthy dose of skepticism in interpretation.


Kevin Small (University of Illinois at Urbana-Champaign)
Monday, March 30, 2009
Interactive Learning Protocols for Natural Language Applications

Statistical machine learning has become an integral technology for solving many informatics applications. In particular, corpus-based statistical techniques have emerged as the dominant paradigm for core natural language processing (NLP) tasks including parsing, machine translation, and information extraction. However, while supervised machine learning is well understood, its successful application to practical scenarios incur significant costs associated with annotating large data sets and feature engineering. In this talk, I will describe methods for reducing annotation costs and improving system performance through interactive learning protocols. The first part of the talk describes my research on active learning strategies for the structured output and pipeline model settings, two widely-used models for complex application scenarios where obtaining labeled data is particularly expensive. Secondly, I will introduce the interactive feature space construction protocol, which uses a more sophisticated interaction to incrementally add application-targeted domain knowledge into the feature space to improve performance and reduce the need for labeled data. I will also present empirical results for the semantic role labeling and named entity/relation extraction NLP tasks, demonstrating state of the art performance with significantly reduced annotation requirements.


Markos Mylonakis (ILLC)
Tuesday, March 24, 2009
An All-Phrase-Pairs Approach for Statistical Machine Translation with Smoothing as a Learning Objective

Phrase-Based Statistical Machine Translation (PBSMT) is one of the best performing and most widely deployed, in academia and business alike, Statistical Machine Translation (SMT) framework. The basic assumption under which it operates is that phrases can be translated independently of each other. Consequently, conditional phrase translation probabilities constitute the principal components of PBSMT systems. Training of these probabilities relies on utilising an already word-aligned parallel corpus. Crucially, given that information on how a translated sentence pair is decomposed into phrase-pairs is of course not part of the training material, estimating phrase translation probabilities is non-trivial. Currently, top-performing systems utilise a heuristic method to attain these.
In this talk, we will review how the challenge of estimating PBSMT parameters relates to Data Oriented Parsing (DOP) model estimation. We will propose a generative model that explains phrase-based translation using a prior over latent segmentations of sentence pairs into phrase-pairs. Furthermore, we will opt for an all-fragment approach, including all phrase-pairs in our phrase table. Finally, utilising the conceptual links between PBSMT and DOP as well as previous work on the latter, we will introduce a novel Smoothing Expectation-Maximisation estimator, which employs smoothing as a learning objective. We shall close by presenting empirical evidence that, where others have concluded that latent segmentations lead to overfitting and deteriorating results, we are able to attain performance equivalent to that of the heuristic estimates on reasonably sized training data.


Dave Cochran (University of St. Andrews & ILLC)
Wednesday, March 11, 2009
Darwinised Data-Oriented Parsing: Statistical NLP with added Sex and Death

Data-Oriented Parsing (DOP) is a state-of-the art approach to both supervised and unsupervised parsing (Bod 1992, 1998, 2006a, 2006b, 2007a, 2007b, Zollman and Sima’an 2005), which has mostly been developed within a technologically-oriented computer science context. Recent work has highlighted some interesting cognitive properties of the Data-Oriented approach (Borensztajn, Zuidema & Bod 2008, Bod 2008). However, these studies have mostly focused on the static properties of the DOP probability model. Here, we present the first attempt at a dynamic, incremental Data-Oriented model which can address the time course of language learning, rather than just the outcome; Darwinised DOP.


Remko Scha (ILLC)
Wednesday, March 4, 2009
Grammars without categories

Most syntactic theories assume that a language has a repertoire of syntactic categories, and a lexicon that assigns one or more categories to the elementary units of the language; on this basis, the grammar of the language then defines the syntactic categories of larger constituents and the grammaticality of strings. The notion of a "syntactic category" thus plays a crucial role in the architecture of such theories. But when we want to describe language acquisition, this notion is deeply problematic. The child starts its language learning process without syntactic categories. Its initial syntactic knowledge is exemplar-based; syntactic categories emerge gradually, and evolve gradually. A model which accounts for this cannot assume syntactic categories as primitives; it must define them in terms of something more basic. In this talk I want to explore how this can be done.
What is a syntactic category? Intuitively, it is a set of words or constituents which are treated in the same way by the operations of the grammar. By taking this intuition literally, we may account for the emergence of syntactic categories. First of all, we observe that a child with a corpus of strings or unlabelled trees does not need syntactic categories to analyze new input: it can substitute words (or longer strings) in exemplars from its corpus. By storing successful substitutions, the child gradually builds a network of intersubstitutable words. If for a certain set of words the substitutability relation has certain formal properties (symmetry, transitivity), the network functions as a syntactic category – although it lacks an explicit category label.
We do not need neural nets to model this process. "Non-categorial language" can be nicely described by means of (probabilistic) rewrite systems which have much in common with ordinary (probabilistic) formal grammars, but which differ from these in two important respects: (1) terminal symbols are allowed to be rewritten, and (2) non-terminal symbols are abolished. (The formalism shares this property with Lindenmayer-systems, and, in fact, with the original Semi-Thue systems). For the earliest stages of child language a very restricted version of this formalism suffices, where rules only rewrite individual words. To account for later stages, the approach can be generalized to tree-rewriting, and thus hook up with whatever we may find cognitively plausible about Data-Oriented Parsing.


Dan Roth (University of Illinois at Urbana-Champaign)
Wednesday, February 11, 2009
Constrained Conditional Models: Learning and Inference in Natural Language Understanding

Making decisions in natural language understanding tasks often involves assigning values to sets of interdependent variables where an expressive dependency structure among these can influence, or even dictate, what assignments are possible. Structured learning problems provide one such example, but we are interested in a broader setting where multiple models are involved and it may not be ideal, or possible, to learn them jointly. I will present work on Constrained Conditional Models (CCMs), a framework that augments probabilistic models with declarative constraints as a way to support decisions in an expressive output space while maintaining modularity and tractability of training. Examples will be drawn from natural language understanding tasks such as semantic role learning (determining who did what to whom when and where), information extraction, transliteration and textual entailment (determining whether one utterance is a likely consequence of another)


Trevor Cohn (University of Edinburgh)
Wednesday, January 28, 2009
Inducing compact but accurate Tree Substitution Grammars

Tree substitution grammars (TSGs) are a compelling alternative to context-free grammars for modelling syntax. However, many current techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) have several problems, such as inconsistency and over-fitting. We present a theoretically principled model which solves these problems using a Bayesian non-parametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far out-performs a standard PCFG.


Yuval Krymolowski (University of Haifa)
Wednesday, January 21, 2009
Automatic Annotation of Morpho-Syntactic Dependencies in a Modern Hebrew Treebank

Morpho-syntactic dependencies between sentence constituents are an inseparable part of syntactic analysis. In Semitic languages, where the order of certain constituents is relatively free, morpho-syntactic agreement features are sometimes the main clue for computational parsing models (Tsarfaty and Sima'an 2007, 2008). We present a rule-based method for automatically adding dependency annotations to the Modern Hebrew Treebank (MHT, Sima'an et al. 2001). We concentrate on mother-daughter dependencies, in which the morphological features of one or more daughter nodes affect the morphological features or syntactic analysis of the mother node. The annotation scheme is used for two purposes: i) annotating the mother-daughter dependencies between nodes in the treebank, and ii) using the generated dependencies for annotating morpho-syntactic features of compound constituents. Manual evaluation shows that the development of the dependency scheme and its automatic implementation were accurate and also proved helpful in improving the quality of the manual annotation. We expect our methodology for automatic dependency annotation of the MHT to be generally applicable to other "Penn-compatible" syntactic resources like the Arabic Treebank.
Joint work with Adi Milea and Yoad Winter

Jakub Szymanik (ILLC)
Wednesday, January 14, 2009
Complexity of Quantifiers

I will survey computational complexity of quantifier constructions in natural language. The talk will be divided into two parts.
In the first part I will describe automata-theoretic model of processing simple (monadic) quantifiers in natural language (van Benthem 1986). In particular, I will present recent empirical data indicating plausibility of this model. In the experiments we compared time needed for understanding different types of quantifier sentences. We show that the distinction between quantifiers recognized by finite-automata, e.g., "at least 5", and push-down automata, e.g., "most", is psychologically relevant. Moreover, our experiment indicates that the comprehension of proportional quantifiers, like "more than half", involves working memory resources. Additionally, we report differences in comprehension of Aristotelian and cardinal quantifiers as well as downward and upward monotone quantifiers.
In the second part of my talk I will have a look at complex (polyadic) quantifiers in natural language. I will start with basic operations creating complex construction from simple quantifiers: boolean combinations, iteration, cumulation, and resumption. We will show that the class of feasible (PTIME computable) quantifiers is closed under those operations. Next we discuss NP-completeness of branching quantifiers. Finally, we will investigate various readings of quantified reciprocal sentences in English (see Dalrymple et al. 1998). We show a dichotomy between those readings: the strong reciprocal reading can create intractable (NP-complete) constructions, while the weak and the intermediate reciprocal readings cannot.

Gideon Borensztajn (ILLC)
Tuesday, December 2, 2008
The Hierarchical Prediction Network, or is the end of symbolic parsing in sight? ;-)

I'll introduce the Hierarchical Prediction Network (HPN). HPN is a connectionist network that has been implemented as a syntactical parser and (semi-)unsupervised grammar inducer. Above all HPN is a model of cortical computation, and it offers a biologically inspired approach to the categorization process that is at the core of cortical information processing. It incorporates key ideas from cognition, such as integration between bottom-up and top-down processing through prediction, hypothesis formation and testing, and hierarchically structured representations, with progressively increasing invariance and temporal compression.
In this presentation I'll demonstrate the operation of HPN as a syntactic parser, and I'll show that HPN is able to parse any sentence generated by a context-free grammar.


Vanessa Ferdinand (UvA)
Wednesday, November 12, 2008
How learning biases and cultural transmission structure information: iterated learning in human subjects and bayesian agents


Federico Sangati (ILLC)
Wednesday, October 22, 2008

Unsupervised Methods for Head Assignments
I will present several algorithms for assigning heads in phrase structure trees, based on different linguistic intuitions on the role of heads in natural language syntax. The starting point of the approach is the observation that a head-annotated treebank defines a unique lexicalized tree substitution grammar. This allows us to go back and forth between the two representations, and define objective functions for the unsupervised learning of head assignments in terms of features of the implicit lexicalized tree grammars. We evaluate algorithms based on the match with gold standard head-annotations, and the comparative parsing accuracy of the lexicalized grammars they give rise to.


Floris Roelofsen (ILLC)
Thursday, September 18, 2008

Anaphora Resolved. A unified theory of pronouns, NP anaphora, and VP ellipsis.
This talk presents one of the main ideas proposed in Floris Roelofsen's dissertation, which will be defended on the 9th of October. Anyone interested is invited to
attend the defense. More info: http://student.science.uva.nl/~froelofs/defense/


Menno van Zaanen (Tilburg University)
Wednesday, September 10, 2008

Generic, Symbolic Sequence Classification

In the field of machine learning, many problems are treated as classification tasks. A classifier takes an event and assigns a (pre-defined) class to it. When events are sequences, the classification may be based on aspects of the structure of these events. For instance, the fact that certain symbols co-occur in a sequence may be an indication that these sequences belong to a certain class.
In this talk, I will describe some research I have been doing in this field recently. This work is still in progress. I will show some results on the task of question classification (assign the type of answer to a question), and composer classification (given a musical piece, assign its composer).

Reut Tsarfaty (ILLC)
Wednesday, August 13, 2008

Relational-Realizational Parsing

State-of-the-art statistical parsing models applied to free word-order languages tend to underperform compared to, e.g., parsing English. Constituency-based models often fail to capture generalizations that cannot be stated in structural terms, and dependency-based models employ a ‘single-head’ assumption that often breaks in the face of multiple exponence. In this paper we suggest that the position of a constituent is a form manifestation of its grammatical function, one among various possible means of realization. We develop the Relational-Realizational approach to parsing in which we untangle the projection of grammatical functions and their means of realization to allow for phrase-structure variability and morphological-syntactic interaction. We empirically demonstrate the application of our approach to parsing Modern Hebrew, obtaining 7% error reduction from previously reported results.

Reut Tsarfaty and Khalil Sima'an (in press). Relational-Realizational Parsing. In: Proceedings of The 22nd International Conference on Computational Linguistics (CoLing).

Stefan Frank (ILLC)
Wednesday, June 4, 2008

Resolving ambiguous pronouns: a psycholinguistic model

In sentences of the form Bob lied to Joe because he could not handle the truth, the pronoun he is ambiguous: It can refer to either Bob or Joe. Resolving this ambiguity requires the application of world knowledge about the causal relation between "lying" and "being able to handle the truth". However, the pronoun may also be (partially) disambiguated because of a bias for one reading over the other. For example, the fact that Bob is mentioned first may lead to the interpretation in which Bob could not handle the truth.

I will present a computational model (Frank et al., 2007) that simulates how human readers resolve such ambiguities. The model is based on three assumptions. First, any biasing effect takes place before world knowledge comes into play. Second, ambiguity resolution is a side-effect of establishing causal coherence in text (cf. Hobbs, 1979; Kehler, 2002). Three, coherence is increased by manipulating mental representations of the situations described by the text. These representations are more like "mental models" (Johnson-Laird, 1983) than like the symbol structures known from formal semantics.

In the model, possible interpretations of an ambiguity are represented by centers of gravity in a high-dimensional space. The unresolved ambiguity forms a vector in the same space. This vector is attracted by the centers of gravity, while also being affected by context information and world knowledge. When the vector reaches one of the centers of gravity, the ambiguity is resolved to the corresponding interpretation. The model accounts for a considerable amount of reading-time and error-rate data and explains, among others, the effects of context informativeness and anaphor type.

Frank, S.L., Koppen, M., Noordman, L.G.M., & Vonk, W. (2007). Coherence-driven resolution of referential ambiguity: a computational model. Memory & Cognition, 35, 1307-1322.

Michiel van Lambalgen (ILLC)
Wednesday, May 21, 2008

Computational semantics, brain and behaviour

My interest in computational semantics is driven by the need to find processing models for language comprehension and production that are on the one hand abstract enough to connect to formal semantics, on the other hand concrete to the extent that they allow predictions for brain imaging experiments and for studies in developmental language disorders. The framework that I developed in joint work with Fritz Hamm (The proper treatment of events, Blackwell 2004) satisfies these desiderata.
I will present two applications: EEG measurements of the processing of the 'past progressive', and deviant uses of tense in children with ADHD.

Tejaswini Deoskar (Cornell University)
Friday, May 23, 2008

Unsupervised re-estimation of probabilistic lexicons for treebank PCFGs

Statistical models of syntactic structure used for natural language parsing have different ways of representing the properties of individual lexical items which determine their associated local and non-local syntactic structure. In all cases, probabilities associated with a majority of open-class lexical items are not represented accurately in models trained solely on labeled (treebank) data, due to the scarcity of labeled data. In this talk, I present procedures which re-estimate the lexical parameters of a treebank PCFG from unlabeled data using the Inside-Outside algorithm, and pool the re-estimated lexical information with lexical information from the treebank PCFG. The procedures produce substantial improvements on the task of determining sub-categorization preferences of novel and low-frequency verbs, relative to a smoothed Penn Treebank PCFG. In addition, I also present a methodology to built an enhanced Penn Treebank PCFG containing lexically-oriented features, which is used as the prior model for the inside-outside procedure.

Cristina Barés Gómez (Institute of Islamic and Near East Studies, Spanish Nationa Research Council, Zaragoza; and Philosophy, Logic, and Philosophy of Sciences Department, University of Sevilla, Spain)
Wednesday, May 14, 2008

Meaning in the automatic interpretation process of ancient Northwest Semitic texts

We intend to offer an outlook of my PhD thesis project, therefore there won’t be any definitive results presented. This work is at the present carried out in the framework of a wider project that tries to implement the interpretation process of Ugaritic, an ancient Semitic language. My research is mainly intended as an approach to the semantic level, but taking also into account all the levels of the language studied by the philologist.

Michael Klein (Radboud University Nijmegen)

Wednesday, April 2, 2008
Computational Modelling of Meaning Processing in the Brain

This talk is about the biologically inspired modelling of meaning processing in the brain. I will start off with two strands of modelling: one that deals with the acquisition and representation of basic concepts by cortical learning algorithms; the other uses value functions over situations and internal models to accomplish goal-directed utterance selection. Fusing these two approaches into one theoretical framework, I will show how conceptual representations in the cortex are likely to be used (and also changed) during information transmission with very simple utterances. The framework also introduces neural mechanisms dealing with information structure (topic/comment) and with basic (sort of Fregian) quantification, both on a very elementary level.

Joakim Nivre (Växjö University and Uppsala University, Sweden)

Wednesday, February 27, 2008
Inductive Dependency Parsing of Natural Language Text

This talk summarizes my research on data-driven dependency parsing over the last five years. To put this work into context, I first discuss what it means to parse a sentence in a text (as opposed to the more well understood notion of parsing a sentence with a formal grammar) and propose criteria for the evaluation of text parsers. I then go on to describe my own dependency-based approach to text parsing, which I characterize as "transition-based" (to distinguish it from the other main tradition in data-driven dependency parsing, which I call "graph-based"). In this approach, inference is performed as a greedy best-first search over a non-deterministic transition system, while learning is reduced to the simple classification problem of mapping each parser state to the correct transition out of that state. I also discuss methods for handling non-projective dependencies (i.e., discontinuous constructions), in particular the widely used pseudo-projective parsing technique, which allows non-projective dependencies to be recovered using a strictly projective parsing algorithm. I conclude with a quick survey of empirical results, focusing on a contrastive error analysis of transition-based vs. graph-based parsing based on data from the CoNLL 2006 shared task.

Jacqueline van Kampen (UiL OTS, Utrecht University)

Wednesday, February 20, 2008
(Modeling) the steps of early syntax acquisition

The abstract can be downloaded here

Philipp Koehn (School of Informatics, University of Edinburgh)

Wednesday, December 12, 2007
Linguistic Problems for Statistical Machine Translation
 
Machine translation is more relevant than ever, especially in a European Union with 23 official languages. What will happen to languages such as Dutch? Will it survive as a language of commerce, or will it be abandoned in favour of English? By lowering translation costs, we would expect to systain the viability of a commercial zone that uses so many different languages. Statistical machine translation holds the promise of instant machine translation. Given open source tools such the Moses decoder, just add a parallel corpus and you have a machine translation system. This talk will present some problems where the standard phrase-based approach fails, and where attention to the specifics of the languages involved is required. I will present methods that deal with different word order, morphology and agglutinative compounding.

Stefan Frank (ILLC)

Wednesday, January 30, 2008
Predicting reading times through language models and world models

According to 'surprisal theory' (Hale, 2001; Levy, in press), the time needed to read a word in a sentence is proportional to the negative logarithm of its probability given the preceding word string. Under this theory, any model that assigns probabilities to sentence-initial substrings also generates predictions of word-by-word reading times. The cognitive plausibility of the model can then be assessed by comparing its predictions to experimentally obtained reading-time data. I'll discuss how such an approach may tell us something about the language constructions used in human sentence processing. Alternatively, word-string probabilities can follow from a world model rather than a language model. This requires a probability distribution over possible worlds and a mapping from sentence-initial substrings onto the corresponding set of possible worlds. There exists a sentence-comprehension model (Frank, Haselager, & Van Rooij, submitted) that provides exactly those two things, albeit only with respect to a 'microworld' and a 'microlanguage'. I'll show some examples of reading-time predictions generated by this model, and explain how they seem to correspond to recent experimental findings.

Rens Bod (ILLC)

Wednesday, January 23, 2008

Markos Mylonakis (ILLC)

Wednesday, October 31, 2007
Unsupervised Estimation for Noisy-Channel Models

Shannon's Noisy-Channel model, which describes how a corrupted message might be reconstructed, has been the corner stone for much work in statistical language and speech processing. The model factors into two components: a language model to characterize the original message and a channel model to describe the channel's corruptive process. The standard approach for estimating the parameters of the channel model is unsupervised Maximum-Likelihood of the observation data, usually approximated using the Expectation-Maximization (EM) algorithm. In this paper we show that it is better to maximize the joint likelihood of the data at both ends of the noisy-channel. We derive a corresponding bi-directional EM algorithm and show that it gives better performance than standard EM on two tasks: (1) translation using a probabilistic lexicon and (2) adaptation of a part-of-speech tagger between related languages.

Henk Zeevat (ILLC, joint work with Uwe Reyle)

Wednesday, October 24, 2007
Semantic Grammar

Frege invented categorial grammar in correspondence with Husserl, but unlike later developments (Ajdukiewicz, Geach, Lambek) the aim was not to characterise the set of well-formed strings but rather to define semantic saturation: words need to combine with each other to give saturated meanings in type t or e.

Building on work by Kamp on presupposition and by Reyle on bottom up drs-induction, we developed a version of Frege's grammar in which concepts are taken as primitive and where concepts need information from their context in order to be saturated. In our conception, saturation is defined by resolvedness: all the contextual binding (including binding from other concepts given by words in the same utterance) needs to have been accomplished.

The resulting grammar and the parsing routine naturally suffers from overgeneration. But it can learn and in fact it seems very suitable as a learning grammar and quite suitable for combination with stochastic considerations. I will end by discussing the following theses:
1. Learning decreases overgeneration.
1'. In \omega, there is no overgeneration.
2. Learning increases the plausibility of the readings found.
2'. In \omega, it will find the intended interpretation.
3. The grammar is prelinguistic and prehuman.
3'. It offers a natural framework for studying and modelling L1 acquisition
3''. It offers a natural framework for studying language change and evolution


Reut Tsarfaty (ILLC)

Wednesday, June 13, 2007
Three-dimensional parametrization for parsing morphologically rich languages

Current parameters of accurate unlexicalized parsers based on Probabilistic Context-Free Grammars (PCFGs) form a two-dimensional grid in which rewrite events are conditioned on both horizontal (head-outward) and vertical (parental) histories. In Semitic languages, where arguments may move around rather freely and phrase-structures are often shallow, there are additional morphological factors that govern the generation process. Here we propose that agreement features percolated up the parse-tree form a third dimension of parametrization that is orthogonal to the previous two. This dimension differs from mere state-splits as it applies to a whole set of categories rather than to individual ones and encodes linguistically motivated co-occurrences between them. This paper presents extensive experiments with extensions of unlexicalized PCFGs for parsing Modern Hebrew in which tuning the parameters in three dimensions gradually leads to improved performance. Our best result introduces a new, stronger, lower bound on the performance of treebank grammars for parsing Modern Hebrew, and is on a par with current results for parsing Modern Standard Arabic obtained by a fully lexicalized parser trained on a much larger treebank.

Reinhard Blutner (ILLC)

Wednesday, May 30, 2007
Quantum probabilities, entanglement, and computational semantics

"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk" (John von Neumann)

Abstract Classical truth-functional semantics and almost all of its modifications have a serious problem in treating prototypes and their combination. Though some modelling variants can fit many of the puzzling empirical observation, their explanatory value is seldom noteworthy. I will argue that the explanatory inadequacy is due to the Boolean characteristic of the underlying semantics, which only allows mixing possible words but it excludes the idea of superposition crucial for geometrical models of meanings. In the first part of the talk I will motivate the "virtual conceptual necessity" of quantum probabilities for the proper analysis of concepts and their combination. The second part introduces quantum probabilities, and the third part starts to discuss a still open list of possible applications in formal and computational semantics.

Louis ten Bosch & Lou Boves (Radboud University Nijmegen)

Wednesday, May 23, 2007
Acquisition of Recognition and Communication Skills (ACORNS)

The topic of this presentation is ACORNS, an FP6 FET project on the acquisition of communication and recognition skills. The goal of ACORNS is to develop a memory-prediction model that will demonstrate the capability to acquire language and communication skills on the basis of rich sensory input. The input will comprise not only speech in the context of references to objects and concepts in the physical environment, but also feedback from other agents in the environment on the actions that the learning agent takes in response to the inputs, guided by an innate need to communicate. We aim at building a computational model of an agent that learns to communicate. To that end we will use methods for representing acoustic signals, for detecting and storing meaningful and robust patterns in these representations, guided by purposeful interaction. The presentation will go into detail with respect to the mathematical/computational and cognitive aspects of the model.

Pieter Adriaans (Informatics Institute, UvA)

Wednesday, May 9, 2007
The Power and Perils of MDL

In this lecture I will present some recent work I did with Paul Vitanyi and Ceriel Jacobs on the application of the MDL (Minimum Description Length) principle to grammar induction. We have studied MDL in terms of two-part code optimization and randomness deficiency. These notions will be explained in the lecture. In this framework we showed that: Using these ideas we have implemented a MDL variant of the EDSM algorithm. The results show that although MDL works well as a global optimization criterion, it falls short of the performance of algorithms that evaluate local features of the problem space. MDL can be described as a global strategy for featureless learning.

Paul Boersma (Phonetic Sciences, University of Amsterdam; joint work with Silke Hamann)

Wednesday, March 21, 2007
The evolution of auditory contrast

This paper reconciles the standpoint that language users do not aim at improving their sound systems with the observation that languages seem to improve their sound systems. Computer simulations of sibilant inventories show that Optimality-Theoretic learners who optimize their perception grammars automatically introduce a so-called prototype effect, i.e. the phenomenon that the learner's preferred auditory realization of a certain phonological category is more peripheral than the average auditory realization of this category in her language environment. In production, however, this prototype effect is counteracted by an articulatory effect that limits the auditory form to something that is not too difficult to pronounce. If the prototype effect and the articulatory effect are of a different size, the learner must end up with an auditorily different sound system from that of her language environment. The computer simulations show that, independently of the initial auditory sound system, a stable equilibrium is reached within a small number of generations. In this stable state, the dispersion of the sibilants of the language strikes an optimal balance between articulatory ease and auditory contrast. This result has been derived from a model without goal-oriented elements.

Reut Tsarfaty (ILLC)

Monday, December 11, 2006
Kamvar, Klein & Manning's (2002) paper on probabilistic interpretations of classical clustering algorithms.

Jelle Zuidema (ILLC)

Wednesday, Novermber 29, 2006
Estimators for Data-Oriented Parsings

We need theoretical as well as empirical criteria for evaluating learning algorithms for natural language. In existing work, the concepts bias and (in)consistency from statistical estimation theory have been applied to evaluate methods for stochastic tree substitution grammars (STSGs), such as used in Data-Oriented Parsing (DOP). We extend this work by showing that all published estimators for DOP are inconsistent in the usual sense. However, we further show that -- contrary to received wisdom -- neither lack of bias, nor consistency on the full class of STSGs are necessary conditions for the adequacy of learning methods. Instead, we propose an evaluation scheme that uses linguistic criteria based on the concept ``consistency class''. We show that most existing methods for STSGs also fail this new test, and argue that our scheme is applicable more widely to machine learning techniques for natural language.

Hartmut Fitz (ILLC)

Wednesday, October 18, 2006
PDP model of complex sentence production

I present a neural-symbolic learning model of sentence production which is trained on a structurally complex language built from simple clause constructions basic to human experience. I investigate the model's learning behavior, its ability to map familiar constituents to novel roles, and its ability to generalize constructions into novel sentence positions.

Joachim de Beule (AI-Lab, VU Brussels)

Wednesday, September 27, 2006
Fluid Construction Grammar

David Ahn (ILPS)

Wednesday, September 13, 2006
Stages of event extraction

Event detection and recognition is a complex task consisting of multiple sub-tasks of varying difficulty. In this talk, I present a simple, modular approach to event extraction that allows us to experiment with a variety of machine learning methods for these sub-tasks, as well as to evaluate the impact on performance these sub-tasks have on the overall task.

Reut Tsarfaty

Monday, July 3, 2006
Hebrew Statistical Parsing

Current parsing models are not immediately applicable for languages that exhibit strong interaction between morphology and syntax, e.g., Modern Hebrew (MH), Arabic and other Semitic languages. This talk presents a first attempt at modeling morphological-syntactic interaction in a generative probabilistic framework to allow for MH parsing. We argue that morphological information is essential for parsing MH, and show how morphological cues improve the syntactic disambiguation capabilities of an integrated model. Following a detailed discussion of the linguistic data that motivates our integrated approach, I present the formal setting and baseline results of our integrated model(s) and then discuss more sophisticated versions by means of which we hope to boost parsing accuracy and improve morphological disambiguation.

Peter Grunwald (CWI)

Thursay, June 29, 2006
Introduction to Modern Minimum Description Length Methods

The Minimum Description Length (MDL) Principle is an information-theoretic method for statistical inference. It is particularly suited to deal with models of arbitary complexity. In recent years, researchers have made significant theoretical advances concerning MDL. In this talk we aim to present these results and their applications to a wider audience. In its modern guise, MDL is based on the concept of a `universal model'. We explain this concept at length. We show that previous versions of MDL (based on so-called two-part codes), Bayesian model selection and predictive validation (a form of cross-validation) can all be interpreted as approximations to model selection based on `universal models'. Modern MDL prescribes the use of a certain `optimal' universal model, the so-called `normalized maximum likelihood model'. This is related to (yet different from) Bayesian model selection with non-informative priors. It leads to a penalization of `complex' models that can be given an intuitive geometric interpretation. Roughly speaking, the complexity of a parametric model is directly related to the number of distinguishable probability distributions that it contains.

Antal van den Bosch (Tilburg University)

Wednesday March 22, 2006
Implicit computational linguistics

Much of computational linguistics, present and past, has borrowed its existence from explicit linguistic abstractions such as parts of speech, constituents, and dependencies. The currently dominant data- driven paradigm, for example, is largely based on inducing models from lexicons and corpora in which these abstractions have been annotated. The fact that virtually all of these abstractions have been questioned and are still fiercely debated in linguistics appears to not have disturbed many in CL. In this talk I aim to show that there is reason to worry and to wonder. First, I review a case of "harmful" explicit modularization in computational models of word pronunciation. Second, I present results from experiments in shallow parsing that show the relative weakness of parts-of-speech as intermediary symbols. Third, I attempt to exemplify the type of implicit computational linguistics research that in my view is an authentic attempt at adopting the greatest protesters against frozen explicit abstractions in linguistics such as Firth, Harris, and Croft, using computational means.

Ton van der Wouden (Leiden, University)

Wednesday, March 15, 2006
Dutch as a Construction Language

We report on ongoing research into the building blocks of spoken language. Strictly decompositional theories about the architecture of the language faculty, with a computational system (grammar) combining elements from a list of words (lexicon), can hardly account for the fact that actual language use is full of recurrent word combinations of various degrees of idiomaticity and abstractness. Alternative theories, such as Construction Grammar and Construction based HPSG, are claimed to do better in this respect. In our talk, we will discuss our methods to isolate recurrent word combinations interesting from the Spoken Dutch Corpus (CGN). We will present first results, and we will reflect on grammatical frameworks to describe them. Time permitting, we will also touch upon issues of implementation.

Rens Bod (ILLC and Computer Science, St Andrews)

Wednesday, March 8, 2006
Unsupervised Data-Oriented Parsing

How can a corpus-based parsing model assign parse trees to sentences if there are no trees in the corpus to begin with? During the last few years there has been considerable progress in unsupervised induction of trees. The most successful unsupervised models come close to the performance of a binarized supervised PCFG on WSJ sentences <= 10 words. This talk shows that we can get even closer to the performance of supervised parsing by applying an "all-subtrees" approach to unsupervised learning. Our approach initially assigns all possible binary trees to a set of given sentences and next uses all subtrees from a subset of these binary trees to compute the most probable parse trees. We show how this model can be implemented by a PCFG-reduction technique and report competitive results on English, German and Chinese data. We argue that previous approaches to tree induction are limited in that they do not take into account structural context and/or non-contiguous substrings.

Remko Scha (ILLC)

Wednesday, February 8, 2006
Data-Oriented Semantics

In this talk I will assume that the audience is at least superficially familiar with the approach to exemplar-based language processing which is known as "Data-Oriented Parsing" (DOP). So far, the models which instantiate the DOP approach tend to deal exclusively with the syntactic aspects of language processing. The purpose of the talk is to look at the prospects of generalizing this work toward the development of data-oriented models of semantics. To make progress in this direction, two different research agendas may be pursued now:
  1. Exemplar-based models of concept-formation -- not only for lexical concepts but also for the operations of "compositional semantics". Work by Renate Bartsch may be a useful starting point here.
  2. Separating the syntactic from the semantic component in our models of sentence probabilities. To the extent that we can use corpora which are annotated semantically as well as syntactically, we may construct Bayesian models which assign distinct probabilities to meanings and to meaning-syntax mappings. At a technical level, there are useful analogies with models for Data-Oriented Translation.
Both lines of thought are directly relevant for the problem of language acquisition, which sooner or later must be faced by all models of linguistic cognition: How are complex syntactic and semantic structures gradually bootstrapped by a system which is merely exposed to concrete real-world situations with accompanying noises?

Jelle Zuidema (ILLC)

Wednesday, November 23, 2005
Data-Oriented Language Learning - from weights to frequencies and back again

Stochastic Tree Substitution Grammars (STSGs), such as used in Data-Oriented Parsing, have great linguistic advantages, essentially merging "construction grammar" with "probabilistic linguistics". However, from a computational linguistics perspective, they pose a number of computational challenges that have not yet been satisfactorily solved. Two fundamental and related problems are "the problem of estimation" -- estimating the weights of an STSG from observed subtree frequencies in a tree bank -- and "the problem of expectation" -- calculating the expected subtree frequencies when generating trees using an STSG with known weights. A linguistic desideratum for estimation is that it converges to the maximally general STSG out of the possibly many correct ones. I will briefly discuss why none of the existing estimation methods fulfills this desideratum. I will then present my recent work on the problem of expectation and discuss how its solution directly suggests an alternative approach to the first problem.

Yoav Seginer (ILLC)

Wednesday, November 9, 2005
Induction of a Dependency Parser

In this talk I describe an unsupervised learning algorithm for the induction of an incremental dependency parser from raw text. The parser and learning algorithm work in tandem to bootstrap the parser - as an utterance is read from left to right, the parser incrementally assigns it a dependency structure based on parameters learned from previous examples. Simultaneously, the learning algorithm uses the resulting parse to improve its estimation of additional parameters. The parser and learning algorithm were designed for and tested on the adult utterances in the Childes corpus. The input is therefore spoken language which is usually syntactically simple, with a limited vocabulary and an extensive use of pronouns. At the same time, the corpus contains a balanced mixture of declaratives, imperatives and questions together with many fragments and interjections. Many of the utterances are incomplete, ungrammatical or marginally grammatical. The algorithm must therefore be resistant to noise and able to assign a dependency structure also to utterances which are not entirely grammatical. I will conclude by discussing the evaluation of such a parser. Comparison with a gold standard is known to be problematic in the context of unsupervised learning and may be entirely impossible when no relevant annotated corpus is available (as in the present case). I will argue that one criterion for the success of an unsupervised learning algorithm is the stability of its output when trained on different corpora of the same language. This should then be controlled for triviality by its failure to generate the same results when trained on various corpora not taken from that language (but with the same set of words).

Henk Zeevat (ILLC)

Wednesday, October 19, 2005
A New Implementation of Optimality Theory?

Typical of existing implementations is a limitation to a maximum number of errors or their non-existence (syntax, pragmatics). The talk discusses work in progress on an implementation scheme in which errors are not computed and in which economy principles follow from the general scheme. The set of candidates for an input is represented by an underspecified representation containing the input and the constraints by routines that try to add information to the underspecified structure in a strictly monotonic way (compare default unification). Errors are just the cases where the constraint cannot add information because conflicting information is already present or the information is ruled out by other factors. By applying the constraints in the order of their strength the optimal candidate is constructed. The talk will present a program for a standard OT application: syllabification (using not unification by CLP) and two schemes for implementing syntax, including a reconstruction of Bresnan's optimal syntax by recasting the constraints as specification routines.

Detlef Prescher

Wednesday, September 28, 2005
Head-Driven PCFGs with Latent-Head Statistics

Although state-of-the-art parsers for natural language are lexicalized, it was recently shown that an accurate unlexicalized parser for the Penn tree-bank can be simply read off a manually refined tree-bank. While lexicalized parsers often suffer from sparse data, manual mark-up is costly and largely based on individual linguistic intuition. Thus, across domains, languages, and tree-bank annotations, a fundamental question arises: Is it possible to automatically induce an accurate parser from a tree-bank without resorting to full lexicalization? In this paper, we show how to induce head-driven probabilistic parsers with latent heads from a tree-bank. Our automatically trained parser has a performance of 85.7% (LP/LR F1), which is already better than that of early lexicalized ones.