2011
- Wednesday 9/2, 13h30, A110:
Mehrnoosh Sadrzadeh,
A Compositional Distributional Model of Meaning
Based on: (Coecke,
Sadrzadeh, Clark) ; a recent follow up, presented at iWCS in
Oxford in January, describing one way of implementing the theory
with some toy-examples
is Concrete
Compositional Sentence Spaces, by Grefenstette, Sadrzadeh, Clark,
Coecke, Pulman.
We propose a mathematical framework for a unification of the distributional theory of meaning in terms of vector space models, and a compositional theory for grammatical types, for which we rely on the algebra of Pregroups, introduced by Lambek. This mathematical framework enables us to compute the meaning of a well-typed sentence from the meanings of its constituents. Concretely, the type reductions of Pregroups are `lifted' to morphisms in a category, a procedure that transforms meanings of constituents into a meaning of the (well-typed) whole. Importantly, meanings of whole sentences live in a single space, independent of the grammatical structure of the sentence. Hence the inner-product can be used to compare meanings of arbitrary sentences, as it is for comparing the meanings of words in the distributional model. The mathematical structure we employ admits a purely diagrammatic calculus which exposes how the information flows between the words in a sentence in order to make up the meaning of the whole sentence. A variation of our `categorical model' which involves constraining the scalars of the vector spaces to the semiring of Booleans results in a Montague-style Boolean-valued semantics.
Nearly complete list of past talks (upto summer 2010)
TUESDAY, June 29, at 16.00 in A1.10
Stefan Frank (ILLC)
Insensitivity of the human sentence-processing system to hierarchical
structure
(with Rens Bod)
Although it is generally accepted that hierarchical phrase structures
are instrumental in describing human language, their cognitive status
is still debated. We investigated the role of hierarchical structure
in sentence processing by implementing a range of probabilistic
language models, some of which depend on hierarchical structure
whereas others use sequential structure only. All models estimated the
occurrence probabilities of syntactic categories in sentences for
which reading-time data was available. Relating the models'
probability estimates to the data showed that the
hierarchical-structure models do not account for reading times over
and above the sequential-structure models. This suggests that a
sentence's hierarchical structure, unlike many other sources of
information, does not affect the generation of expectations about
upcoming words.
Wednesday, July 7, at 16.00 in A1.10
Federico Sangati (ILLC)
A probabilistic generative model for an intermediate constituency-dependency representation
We present a probabilistic model extension to the Tesnière
Dependency Structure (TDS) framework formulated in (Sangati and
Mazza, 2009). This rrepresentation incorporates aspects from both
constituency and dependency theory. In addition, it makes use of
junction structures to handle coordination constructions. We test
our model on parsing the English Penn WSJ treebank using a
re-ranking framework. This technique allows us to efficiently test
our model without needing a specialized parser, and to use the
standard evaluation metric on the original Phrase Structure
version of the treebank. We obtain encouraging results: we achieve
a small improvement over state-of-the-art results when reranking a
small number of candidate structures, on all the evaluation
metrics except for chunking.
Henk Zeevat
(ILLC)
Wednesday, June 16. 2010
Syntactic Paradigms
(with Alessandro lo Popolo)
Paradigms are the bread and butter of traditional linguistics and it is
surprising that they do not have a more central place in modern
linguistics. Arguably they are central for semantic change, for the
grammaticalisation of originally semantic features and for the elusive
semantics of tense morphemes, prepositions and modal particles. Yet,
they are hard to identify within the modern proposals for formal
grammars.
Morphological paradigms as conceived by Anderson and Blevins are
naturally analysed within optimality theory as the set of expressive
constraints that apply to a certain lexical category. They state which
abstract features must be expressed by the form of the word. This sets
up the abstract paradigm: the set of cells for which the paradigm
provides specific forms. The specific forms can also be described in
OT: the combination of a cell and a word can be taken as the input in a
morphological competition.
This interpretation of paradigms automatically generalises from
morphology to syntax. Complex categories like NP, S and VERB
obligatorily express certain features like definiteness, wh in the case
of NP, finiteness, tense, aspect, modality, evidentiality, additivity
in the case of S, and a similar range for verb. What
features and the precise nature of the features is language dependent.
The obligatory expresion is most simply described by expressive
constraints restricted to the category.
What is different however is the way in which the cells are defined. A
complex category can satisfy an expressive constraint in three ways: by
creating a formal property that expresses the feature (word order), by
building a constituent that expresses the feature and by having a constituent that expresses the feature.
As an illustration, the talk will try to identify negation paradigms in
Russian, Italian, French, English and Dutch and provide a uniform
treatment of negation for these languages. We believe notion of
syntactic paradigms can be useful for semantics, syntax, computation
and for understanding the historical processes underlying the formation
of languages.
Willem Zuidema
(ILLC)
Wednesday, June 9, 2010
Empirical evidence for recursive hierarchical structure in child language
There is general agreement that adult language shows recursive,
hierarchical phrase-structure (rhps), but many questions remain about
how abundant recursion really is in language and about how children
arrive at it. One hypothesis is that rhps is an innate, formal
universal of natural language, but an equally popular idea is that
it is the emergent result from children building generalisation on
generalisation, and need not even be universal. Researchers trying to
empirically establish the moment "recursion" enters a child's
language, or more generally count the occurrences of recursion, face a
difficult methodological challenge: the usual introspective evidence
from the theoretical linguist is unavailable for such questions, and
behavioral data does not unambiguously reveal whether a recursive
operation or simple "caching" was used to generate or analyze
utterances that look like adult recursions. In this talk, I will argue
that neither a priori arguments, nor stopgap empirical data will
suffice to resolve this debate; rather, a careful analysis using
techniques from statistical machine learning and formal grammars
is called for.
Even if a a gold standard phrase-structure annotation is given, it
is not obvious how to count occurrences of recursion. Of course, from
such data we can easily extract a dominance matrix, which
records for every pair of syntactic categories X and Y, how often X
dominates Y. But I will show that simply counting how often one finds
a node dominated by a node of the same category, can both over- and
underestimate the frequency of "real" recursion. Finally, I will
present an alternative measure of "recursiveness" based on the
deviations from a linear ordering of syntactic categories
w.r.t. dominance, and show how data from Childes show a steady
increase with age of the child on this metric.
Gideon Borensztajn
(ILLC)
Tuesday, May 25, 2010
Pointers in the brain:
What the systematicity of language tells about cortical connectivity
and connectionism
Afra Alishahi
(University of Saarland, Saarbrücken)
Wednesday, May 19, 2010
A Bayesian account of
the acquisition of abstract argument structure constructions
Developing computational algorithms that capture the complex
structure of natural languages is an open problem. In particular,
learning the abstract properties of language only from usage data
without built-in knowledge of language structure remains a challenge.
We have developed a Bayesian model of the acquisition and use of verb
argument structure from child-directed data. In our model, the general
constructions of language (such as transitive and intransitive) are
viewed as a probability distribution over the syntactic and semantic
features, e.g., the semantic properties of the verb and its arguments,
and their relative word order in an utterance. Constructions are
learned through clustering similar verb usages. Language use, on the
other hand, is modeled as a Bayesian prediction problem, where the
missing features in a usage are predicted based on the available parts
and the acquired constructions (e.g., in sentence production, the best
syntactic pattern for an utterance is predicted from the available
semantic information). The model can successfully learn the common
constructions of language, and its behaviour shows similarities to
actual child data, both in sentence production and comprehension.
Moreover, the acquired knowledge of language in this model is robust
yet flexible, and many general patterns of behaviour that are observed
in children can be simulated and explained by this approach.
Raquel
Fernández Rovira (ILLC)
Wednesday, April 21, 2010
Incrementality and Relative
Gradable Properties
There is a growing amount of psycholinguistic evidence showing that
humans process and produce language incrementally, i.e. bit by bit
before utterance or constituent boundaries are reached, and making use
of different sources of information---speech, syntactic structure,
semantic interpretation, pragmatic knowledge---that interact in
parallel. In recent years, (computational) linguists have started to
develop theories and systems that are more consistent with these
psycholinguistic findings---a trend very much noticeable in recent work
on spoken dialogue systems. In this talk, reporting on work in
progress, I will discuss a number of questions that embracing
incrementality raises for the interpretation of relative gradable
adjectives in referential descriptions, as well as for the
semantics/pragmatics interface at large.
Gerold Schneider (University of Zurich)
Wednesday, April 7, 2010
Parsing with Dependency Grammar:
combining hand-written rules and corpus statistics
In this talk I will present the fast, robust, large-scale dependency
parser Pro3Gres, which combines hand-written rules and corpus
statistics.
I will focus on the following topics:
- Dependency Grammar (DG): Differences between Lucien Tesniere's DG
and currently used popular DGs, including the one of Pro3Gres. X-bar and
DG.
- Hand-written rules and constraints in combination with corpus
data. Search space reductions, statistical models, domain adaptations.
- DG as an f-structure only version of Lexical-Functional
Grammar (LFG).
- Overview of applications of the parser: information retrieval
in the biomedical domain, descriptive corpus linguistics.
Workshop Theory, Typology & Technology: Parsing in the face of diversity
Tuesday, March 23, 2010
Abstract and program
Confirmed Speakers:
- James P. Blevins (University of Cambridge, UK)
- Mark Johnson (Macquarie University, Sydney, Australia)
- Joakim Nivre (Uppsala University, Sweden)
- Owen Rambow (Columbia University, USA)
- Gregory Stump (University of Kentucky, USA)
- Yoad Winter (Utrecht University, The Netherlands)
- Reut Tsarfaty (University of Amsterdam, The Netherlands)
Hartmut Fitz (University of Groningen)
Wednesday, March 10, 2010
Statistical learning of complex questions
The problem of auxiliary fronting in complex polar questions occupies a
prominent position within the nature versus nurture debate
in language acquisition. Usage-based theories of language need to
explain
how the syntax of these questions can be acquired from experience.
First, I survey several data-driven models which have recently
attempted
to address this issue. Then I present a linear statistical learner,
called
the adjacency-prominence learner, which uses sequential and semantic
information to produce utterances from a bag of words. I show that this
model is capable of generating grammatical complex questions (i)
without
explicitly representing hierarchical phrase structure and (ii) without
exposure to the target utterances in its training environment.
Implications for nativist theories of language acquisition are
discussed.
Karl Magnus Petersson (Max Planck Institute for Psycholinguistics)
Wednesday, February 10, 2010
The Neurobiology of Syntax: Recursion and Dynamical Systems
The
language faculty is a neurobiological system
that provides humans with the capacity to understand, for all practical
purposes, an unlimited set of sentences. At least since von Humboldt
(1836),
theoretical linguists have interpreted ''practically unlimited'' as
meaning
infinite and it is argued that this ''linguistic infinity''
necessitates a recursive
syntactic rule set – a knowledge structure that, for example, allows
humans to
embed and understand phrases within phrases without limit. Hornstein
(2009, A
Theory of Syntax: Minimal Operations & Universal Grammar) argue
that the
language faculty must be simpler than previously thought. This is
crucial,
because ultimately, whatever syntactic operations linguists propose,
these must
be implementable in neural processing infrastructure. Thus,
neurobiology puts
hard constraints on the properties of the language faculty. We argue
that the
finiteness of neural systems, in terms of memory capacity and
processing
precision, is such a constraint. What are the implications of this for
neurobiological
models of syntax? First, we argue that it is not meaningful to separate
''syntactic
computation'' from ''processing memory'' - or competence from
performance in
linguistic terms - and we note that the concept of a recursive rule set
does
not need to be motivated by an ideal ''linguistic infinity''. Instead,
the relevant
fact is the human capacity to process bounded patterns of non-adjacent
dependencies in language – there is a definite upper-bound on
''distance'' set
by neurobiology. Second, we are free to choose any syntactic framework
we
prefer as long as this serves its purpose - for example, we may choose
to
capture non-adjacent dependency processing in bounded recursive
formalisms.
However, because neural processing systems belong to the class of
adaptive
stochastic dynamical systems with fading memory properties, exemplified
by
noisy spiking network processors, it seems more natural to try to
understand
syntax processing in terms of this type of systems. We illustrate this
theoretical discussion with empirical results from behavioral, TMS, and
FMRI
investigations of Broca’s region in the context of implicit acquisition
of simple
artificial unification grammars as well as some insights from
computational modeling.
Vera Demberg (University of Edinburgh)
Wednesday, February 3, 2010
A Broad-Coverage Model of Prediction in Human Sentence Processing
The aim of my recent research is to design and implement a cognitively
plausible theory of sentence processing which contains a mechanism for
modelling prediction and verification as processes in language
understanding. Modelling prediction is an interesting and relevant
problem because recent experimental evidence suggests that humans
predict upcoming structure or lexemes during sentence processing.
However, none of the current sentence processing theories model
prediction explicitly.
In my talk, I will explain the mechanisms in my sentence processing
theory, as well as the requirements to a parser that observes the
strict incrementality requirement stated in the processing theory. The
linguistic formalism I use to describe the incremental derivations is a
modified version of tree-adjoining grammar, called PLTAG for
"PsychoLinguistically motivated TAG". In the last part of my talk, I
will show that the sentence processing theory using the parser
successfully models psycholinguistic sentence processing phenomena.
Frans Adriaans (Utrecht University)
Wednesday, January 27, 2010
StaGe: A Model for the Induction of Phonotactic Constraints from Continuous Speech
Infants are faced with the challenge of building up a lexicon
of discrete, word-sized units from continuous speech input. In this
talk, I will present a computational model for the induction of
phonotactic constraints from continuous speech. Such constraints
provide language learners with a valuable cue for the detection of word
boundaries in the speech stream (McQueen, 1998). StaGe (Adriaans &
Kager, in press) implements two learning mechanisms that have been
shown to be available to infants: statistical learning (e.g., Saffran,
Aslin, & Newport, 1996) and generalization (e.g., Saffran &
Thiessen, 2003). While current speech segmentation models typically
rely on statistical information (e.g. transitional probabilities), we
show that feature-based generalization over statistically learned
biphone constraints improves the speech segmentation performance of the
learner. These results indicate a potential role for phonotactic
generalizations in human speech segmentation.
References
Adriaans,
F., & Kager, R. (in press). Adding generalization to statistical
learning: The induction of phonotactics from continuous speech. Journal of Memory and Language, doi:10.1016/j.jml.2009.11.007
McQueen, J. M. (1998). Segmentation of continuous speech using phonotactics. Journal of Memory and Language, 39, 21–46.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928.
Saffran, J. R., & Thiessen, E. D. (2003). Pattern induction by infant language learners. Developmental Psychology, 39, 484–494.
Ingrid Nieuwenhuis (Radboud University Nijmegen)
Wednesday, January 20, 2010
Sleep enhances the implicit extraction of grammar rules
Grammar learning requires the extraction of complex rules
implicitly from experience. We exposed participants to letter sequences
generated from an artificial (Reber) grammar and tested their ability
to classify new sequences as grammatical or not. We show that sleeping
between exposure and testing specifically enhances the acquisition of
the grammatical structure, as opposed to effects caused by familiarity
with local sub-sequences. These results suggest that an active rule
extraction process takes place during sleep.
Stefan Frank (ILLC)
Wednesday, December 2, 2009
Investigating the
roles of expectation and uncertainty in human sentence processing
It is well established by now that the time needed to read a word in
sentence context is positively correlated to its surprisal, as can be
estimated by probabilistic language models. A word's surprisal has an
intuitive interpretation as the extent to which the word came
unexpected. However, other probabilistic measures of processing effort
have also been suggested. For example, Roark et al. (2009) showed that
uncertainty about the upcoming word (formalized as the entropy of the
next-word probability distribution) accounts for word-reading time,
over and above the words' surprisal. In contrast, Hale (2003, 2006)
argued that it is the reduction
in entropy, resulting from processing a word, which is indicative of
processing effort. Although he provided several examples of how entropy
reduction can explain particular psycholinguistic phenomena, the
relation between entropy reduction and reading time has not previously
been tested on a scale that allows for a statistical test of the
entropy-reduction hypothesis. I'll present some new data
showing that entropy reduction is indeed predictive of
word-reading times, over and above surprisal, and that (contrary to
Roark et al.) entropy itself has no additional explanatory value.
However, the psychological interpretation of entropy reduction remains
unclear.
Federico Sangati (ILLC)
Wednesday, November 25, 2009
An English Dependency Treebank à la Tesnière
During the last decade, the Computational Linguistics community has
shown an increased interest in Dependency Treebanks. Several groups
have developed new annotated corpora using dependency representation,
while other people have proposed several automatic conversion
algorithms to transform available Phrase Structure (PS) treebanks into
Dependency Structure (DS) notation. Such projects typically refer to
Tesnière as the father of dependency syntax, but little attempt
has been made to explain how the chosen representation relates to the
original work. A careful comparison reveals substantial differences:
modern DS annotations discard some relevant features characterizing
Tesnière’s model. This paper is presenting our attempt to
go back to the roots of dependency theory, and show how it is
possible to transform a PS English treebank to a DS notation that is
closer to the one proposed by Tesnière, which we will refer to
as TDS. We will show how this representation can incorporate all main
advantages of modern DS, while avoiding well known problems concerning
the choice of heads, and better representing common linguistic
phenomena such as coordination.
http://staff.science.uva.nl/~fsangati/TDS/TLT8_Sangati_Mazza.pdf
Tejaswini Deoskar (ILLC)
Wednesday, September 30, 2009
Smoothing fine-grained PCFG lexicons
We present an approach for smoothing treebank-PCFG lexicons with lexical
information obtained from a large unannotated corpus, by
interpolation of treebank lexical parameters with estimates obtained
from unannotated data via the inside-outside algorithm. The PCFG has
complex lexical categories, making relative-frequency estimates from a
treebank very sparse. This kind of smoothing for complex lexical
categories results in improved parsing performance, with a particular
advantage in identifying obligatory arguments subcategorized by verbs
unseen in the treebank.
Federico Sangati (ILLC)
Wednesday, September 30, 2009
A generative re-ranking model for dependency parsing
We propose a framework for dependency parsing based on a combination of
discriminative and generative models. We use a discriminative model to
obtain a k-best list of candidate parses, and subsequently rerank those
candidates using a generative model. We show how this approach allows
us to evaluate a variety of generative models, without needing
different parser implementations. Moreover, we present empirical
results that show a small improvement over state-of-the-art dependency
parsing of English sentences.
Maarten Versteegh (Radboud University Nijmegen)
Wednesday, September 23, 2009
Using Data-Oriented Parsing to model syntactic change
A number of properties of the data-oriented parsing model have also
been identified as being important in grammaticalization theories of
language change. The exemplar-based nature, the use of probabilities
and the incorporation of constructions are shared by both approaches.
Building on data-oriented parsing, we present a computational model of
syntactic change that can account for the major mechanisms of language
change. We show the model's plausibility by simulating several
historical situations of change.
Gerard Kempen (Max Planck Institute for Psycholinguistics, Nijmegen, and Leiden University)
Wednesday, September 2, 2009
The Unification Space implemented as a localist neural net: Predictions and error-tolerance in a constraint-based parser
Joint work with Theo Vosse
We introduce a novel computer implementation of the Unification-Space parser (Vosse & Kempen
2000) in the form of a localist neural network whose dynamics is based on interactive activation and
inhibition. The wiring of the network is determined by Performance Grammar (Kempen &
Harbusch 2003), a lexicalist formalism with feature unification as binding operation. While the
network is processing input word strings incrementally, the evolving shape of parse trees is
represented in the form of changing patterns of activation in nodes that code for syntactic properties
of words and phrases, and for the grammatical functions they fulfill. The system is capable, at least
in a qualitative and rudimentary sense, of simulating several important dynamic aspects of human
syntactic parsing, including garden-path phenomena and reanalysis, effects of complexity (various
types of clause embeddings), fault-tolerance in case of unification failures and unknown words, and
predictive (expectation-based) parsing. English is the target language of the parser described, and a
demonstration version of the software is available form the authors via the internet.
Tim O'Donnell (Harvard University)
Monday, August 10, 2009
Computation and Reuse in Language
Productivity in language is made possible by a division of labor
between computation and storage: stored lexical items are
combined via computation into more complex structures. A central
question for theories of language is what constitutes this inventory of
stored items: Where do the stored items come from? Under what
conditions does storage happen? How are storage and computation
integrated? I will present a
Bayesian framework designed to study these questions, along with some
preliminary empirical evaluation
Reut Tsarfaty (ILLC)
Monday, July 27, 2009
Parsing a (relatively) free word-order language
Peter beim Graben (University of Reading)
Monday, June 29, 2009
Dynamic Cognitive Modeling of Syntactic Language Processing
Joint work with Roland Potthast
I will present Dynamic Cognitive Modeling [1] as a three tier top-down
approach comprising the levels of (1) cognitive processes; (2) their
state space representations; and (3) dynamical systems implementations
that are guided by neuroscientific principles. These levels are passed
through in a top-down fashion: (1) cognitive processes are described as
algorithms sequentially operating on complex symbolic data structures
that are decomposed using so-called filler/role bindings [1,2]; (2)
data structures are mapped onto points in abstract vector spaces using
tensor product representations [1,2]; (3) cognitive operations are
implemented as dynamics of neural networks or neural/dynamic fields.
The last step involves the solution of inverse problems, namely
training the system's parameters to reproduce prescribed trajectories
of cognitive operations in representation space. I present a
regularization technique for the common Hebb rule, called
Tikhonov-Hebbian learning, in order to tackle the ill-posedness of the
inverse problem [1]. The method is illustrated by means of an
instructive example from syntactic language processing [3]. I construct
a functional representation [4] of a context-free left-corner parser
over a three-dimensional feature space processing the well-formed
sentence
(1) Die Gans wurde im Ofen gebraten
“The goose was grilled in the oven”
and the phrase structure violation
(2) Die Gans wurde im gebraten
“The goose was grilled in”.
After training a neural field through Tikhonov-Hebbian learning, the
differences of neural activation exhibit some remarkable resemblance
with the event-related brain potentials reported in [3].
References
[1] beim Graben, P. & Potthast, R. (2009). Inverse problems in
dynamic cognitive modeling. Chaos: An Interdisciplinary Journal of
Nonlinear Science, 19, 015103.
[2] Smolensky, P. & Legendre, G. (2006). The Harmonic Mind. From
Neural Computation to Optimality-Theoretic Grammar, MIT Press.
[3] Hahne, A. & Friederici, A. D. (1999). Electrophysiological
evidence for two steps in syntactic analysis: Early automatic and late
controlled processes. Journal of Cognitive Neuroscience, 11, 194
– 205.
[4] beim Graben, P., Pinotsis, D., Saddy, D. & Potthast, R. (2008).
Language processing with dynamic fields. Cognitive Neurodynamics, 2(2),
79 – 88.
Henk Zeevat (ILLC)
Wednesday, June 24, 2009
Bayesian Interpretation
Click here for the abstract
Amit Mukerjee (Indian Institute of Technology, Kanpur)
Thursday, June 18, 2009
The constructivist enterprise: towards computational language acquisition
Slides (pdf)
The constructivist approach to language is in contrast with the generativist
model, which believed in the autonomy of syntax. Most models of
computational language derive from generativist ideas, treating its units as
formal, empty symbols - semantics is defined (or derived) as a secondary
process. In the constructivist model, symbols are thought of as a tight
coupling between a phonological pole and a diffuse set of associations that
constitute its semantic pole. Grammar is viewed as the binding of such units
into larger bipolar structures.
Further, while humans master the complexities of mental processes and
language gradually, computational structures tended to focus on the finished
adult language. This work is premised on the belief that in order to build
constructivist models of language, we must start from an infant-like state,
and trace the growth of the semantic-syntactic grammar simultaneously,
eventually leading to adult-like performance. Thus, in a sense, this
approach attempts to implement a version of the strongly physicalist Aufbau
program a la Carnap.
We present some very preliminary work towards this goal, where we expose the
system to schematic and real videos of events. Using a bottom-up model of
attention, we parse these visual streams into a very natural set of features,
and use completely unsupervised approaches to categorize the agents, actions,
and relations in the video. The oracles that categorize a scene into one
structure or the other may be thought of as "image schemas", adopting a term
from developmental psychology. Since they provide a mapping from perception,
the set of image schemas may be viewed as a form of content-based index.
Unlike the predicates of binary logic, image schemas provide graded
membership, which permits a degree of defeasibility.
Subsequently, when the system is exposed to linguistic commentaries of the
same scenes, we show that it can immediately associate many of these
image-schemas with linguistic units in the text. Furthermore, structures
such as the number of subcategories of verbs (valency) are shown to be
derivable from the semantics directly. Though this is not demonstrated here,
once the frame structures associated with the clausal head is known, it may
be assumed that syntactical patterns implementing this at the phonological
level (syntax) may be learned relatively easily.
This is of course a very preliminary attempt, and much work remains,
especially in the scaling up from this infant-like state to larger and more
abstract situations. However, the promise of extremely adaptive interaction,
defeasibility, inverse indexing with its possibility of generation, and the
unsupervised nature of the process is is likely to give the constructivist
enterprise a significant push in years to come.
Miles Osborne (University of Edinburgh)
Wednesday, June 10, 2009
Stream-based randomised language modelling for machine translation
Joint work with Abbey Levenberg
We now live in a world where we have more data than we can possibly
use. For example, consider all of the newswire that is published on
the Web each day, seven days a week, fifty-two weeks a year. For
machine translation this abundance of data means we can produce more fluent
translations. However, traditional language models built using
gigantic amounts of data can easily exceed the capacities of our
machines. Quite simply, we have more data than we can easily use.
Recent advances in randomised representations (Bloom Filters and
variants) allow us to represent a lot of information in small
space. These techniques are probabilistic and make errors at a given
rate. Applied to language modelling, this means we can use more of the
available data than ever before. I shall explain how randomised techniques can
be used to build what are probably the largest language models
outside of companies such as Google. This allows us to answer
questions such as does using a trillion words of text produce better
translation results?
One problem with current randomised LMs is that being batch-based they
are unsuitable for modelling an unbounded stream of language whilst
maintaining a constant error rate. I shall also present a novel
randomised language model which is online and allows for modelling of
an unbounded text stream. Translation experiments over a text stream
show that our online randomised model matches the performance
of batch-based randomised LMs without incurring the computational overhead
associated with full retraining. This opens up the possibility of
randomised language models which continuously adapt to the massive volumes of
texts published on the Web each day.
Minisymposium on language evolution
Monday, April 20, 2009
Kenny Smith (Northumbria University, Newcastle) and Dan Dediu (MPI for Psycholinguistics, Nijmegen)
Kenny Smith
Language change and language evolution in the laboratory
Language is culturally transmitted: the language we speak is, at least
in part, determined by the language we hear others produce. This means
that language is an evolutionary system in its own right. I will
present an experimental paradigm for studying the cultural evolution of
language, and describe a series of experiments involving the iterated
learning of artificial languages by human participants. In the first
part of the talk I'll focus on a simple iterated learning experiment
(developed in conjunction with Elizabeth Wonnacott, University of
Oxford) which can be used to explore regularization and the elimination
of unpredictable variation. In the second part of the talk I'll present
further iterated learning experiments (carried out jointly with Simon
Kirby and Hannah Cornish, University of Edinburgh) which show that basic
structural properties of language can emerge through similar processes.
Previous computational and mathematical models suggest that iterated
learning provides an explanation for the structure of human language:
our experimental work shows that the predictions of these models, and
models of cultural evolution more generally, can be tested in the
laboratory.
Dan Dediu
Genetic biases and language change: how well do simple models of language evolution generalize?
A proper account of language change and evolution involves at a
fundamental level the understanding of the complex relationship between
genetic biases, individual learning and cultural transmission. At one
extreme there are some massively nativist accounts in the form of an
Universal Grammar (UG) which would explain at the same time both the
universals of language and its range of variation. At the other
extreme, there are accounts which see language as an adaptive system
evolving within the constraints of a fairly generic cognitive system
(and brain). In this talk I will try to cover two apparently distinct
themes, but which are nevertheless intimately connected. First, I will
present a brief overview of the apparent relationship between the
distribution of tone languages and the derived haplogroups of ASPM and
Microcephalin, which rises the question of a genetic biasing of
language transmission in such a way that the spatial patterning of this
bias influences the spatial patterning of linguistic tone. This
suggestion, in turn, rises a host of questions concerning the nature of
this bias and its mechanisms. To this end, I tried to use the promising
new Bayesian Iterated Learning Model (BILM) paradigm but with
surprising results. I will present simulation data showing that the
(now classic) results obtained in homogeneous chains of single Bayesian
agents are not robust to changes in the social parameters of the model.
More precisely, they break down even for heterogeneous chains of two
agents as well as for more complex populations. This suggests that such
simple models, even if mathematically and computationally very elegant
and powerful, might not in fact tell us much about language change and
evolution, but require a careful study of their properties and a
healthy dose of skepticism in interpretation.
Kevin Small (University of Illinois at Urbana-Champaign)
Monday, March 30, 2009
Interactive Learning Protocols for Natural Language Applications
Statistical machine learning has become an integral technology for
solving many informatics applications. In particular, corpus-based
statistical techniques have emerged as the dominant paradigm for core
natural language processing (NLP) tasks including parsing, machine
translation, and information extraction. However, while supervised
machine learning is well understood, its successful application to
practical scenarios incur significant costs associated with annotating
large data sets and feature engineering.
In this talk, I will describe methods for reducing annotation costs and
improving system performance through interactive learning protocols.
The first part of the talk describes my research on active learning
strategies for the structured output and pipeline model settings, two
widely-used models for complex application scenarios where obtaining
labeled data is particularly expensive. Secondly, I will introduce the
interactive feature space construction protocol, which uses a more
sophisticated interaction to incrementally add application-targeted
domain knowledge into the feature space to improve performance and
reduce the need for labeled data. I will also present empirical results
for the semantic role labeling and named entity/relation extraction NLP
tasks, demonstrating state of the art performance with significantly
reduced annotation requirements.
Markos Mylonakis (ILLC)
Tuesday, March 24, 2009
An All-Phrase-Pairs Approach for Statistical Machine Translation with
Smoothing as a Learning Objective
Phrase-Based Statistical Machine Translation (PBSMT) is one of the
best performing and most widely deployed, in academia and business
alike, Statistical Machine Translation (SMT) framework. The basic
assumption under which it operates is that phrases can be translated
independently of each other. Consequently, conditional phrase
translation probabilities constitute the principal components of PBSMT
systems. Training of these probabilities relies on utilising an
already word-aligned parallel corpus. Crucially, given that
information on how a translated sentence pair is decomposed into
phrase-pairs is of course not part of the training material,
estimating phrase translation probabilities is non-trivial. Currently,
top-performing systems utilise a heuristic method to attain these.
In this talk, we will review how the challenge of estimating PBSMT
parameters relates to Data Oriented Parsing (DOP) model estimation. We
will propose a generative model that explains phrase-based translation
using a prior over latent segmentations of sentence pairs into
phrase-pairs. Furthermore, we will opt for an all-fragment approach,
including all phrase-pairs in our phrase table. Finally, utilising the
conceptual links between PBSMT and DOP as well as previous work on the
latter, we will introduce a novel Smoothing Expectation-Maximisation
estimator, which employs smoothing as a learning objective. We shall
close by presenting empirical evidence that, where others have
concluded that latent segmentations lead to overfitting and
deteriorating results, we are able to attain performance equivalent to
that of the heuristic estimates on reasonably sized training data.
Dave Cochran (University of St. Andrews & ILLC)
Wednesday, March 11, 2009
Darwinised Data-Oriented Parsing: Statistical NLP with added Sex and Death
Data-Oriented Parsing (DOP) is a state-of-the art approach to both
supervised and unsupervised parsing (Bod 1992, 1998, 2006a, 2006b,
2007a, 2007b, Zollman and Sima’an 2005), which has mostly been
developed within a technologically-oriented computer science context.
Recent work has highlighted some interesting cognitive properties of
the Data-Oriented approach (Borensztajn, Zuidema & Bod 2008, Bod
2008). However, these studies have mostly focused on the static
properties of the DOP probability model. Here, we present the first
attempt at a dynamic, incremental Data-Oriented model which can address
the time course of language learning, rather than just the outcome;
Darwinised DOP.
Remko Scha (ILLC)
Wednesday, March 4, 2009
Grammars without categories
Most syntactic theories assume that a language has a repertoire of
syntactic categories, and a lexicon that assigns one or more categories
to the elementary units of the language; on this basis, the grammar of
the language then defines the syntactic categories of larger
constituents and the grammaticality of strings. The notion of a
"syntactic category" thus plays a crucial role in the architecture of
such theories. But when we want to describe language acquisition, this
notion is deeply problematic. The child starts its language learning
process without syntactic categories. Its initial syntactic knowledge
is exemplar-based; syntactic categories emerge gradually, and evolve
gradually. A model which accounts for this cannot assume syntactic
categories as primitives; it must define them in terms of something
more basic. In this talk I want to explore how this can be done.
What is a syntactic category? Intuitively, it is a set of words or
constituents which are treated in the same way by the operations of the
grammar. By taking this intuition literally, we may account for the
emergence of syntactic categories. First of all, we observe that a
child with a corpus of strings or unlabelled trees does not need
syntactic categories to analyze new input: it can substitute words (or
longer strings) in exemplars from its corpus. By storing successful
substitutions, the child gradually builds a network of
intersubstitutable words. If for a certain set of words the
substitutability relation has certain formal properties (symmetry,
transitivity), the network functions as a syntactic category –
although it lacks an explicit category label.
We do not need neural nets to model this process. "Non-categorial
language" can be nicely described by means of (probabilistic) rewrite
systems which have much in common with ordinary (probabilistic) formal
grammars, but which differ from these in two important respects: (1)
terminal symbols are allowed to be rewritten, and (2) non-terminal
symbols are abolished. (The formalism shares this property with
Lindenmayer-systems, and, in fact, with the original Semi-Thue
systems). For the earliest stages of child language a very restricted
version of this formalism suffices, where rules only rewrite individual
words. To account for later stages, the approach can be generalized to
tree-rewriting, and thus hook up with whatever we may find cognitively
plausible about Data-Oriented Parsing.
Dan Roth (University of Illinois at Urbana-Champaign)
Wednesday, February 11, 2009
Constrained Conditional Models: Learning and Inference in Natural Language Understanding
Making decisions in natural language understanding tasks often involves
assigning values to sets of interdependent variables where an
expressive dependency structure among these can influence, or even
dictate, what assignments are possible. Structured learning problems
provide one such example, but we are interested in a broader setting
where multiple models are involved and it may not be ideal, or
possible, to learn them jointly.
I will present work on Constrained Conditional Models (CCMs), a
framework that augments probabilistic models with declarative
constraints as a way to support decisions in an expressive output space
while maintaining modularity and tractability of training.
Examples will be drawn from natural language understanding tasks such
as semantic role learning (determining who did what to whom when and
where), information extraction, transliteration and textual entailment
(determining whether one utterance is a likely consequence of another)
Trevor Cohn (University of Edinburgh)
Wednesday, January 28, 2009
Inducing compact but accurate Tree Substitution Grammars
Tree substitution grammars (TSGs) are a compelling alternative to
context-free grammars for modelling syntax. However, many current techniques
for estimating weighted TSGs (under the moniker of Data Oriented Parsing)
have several problems, such as inconsistency and over-fitting. We present
a theoretically principled model which solves these problems using a
Bayesian non-parametric formulation. Our model learns compact and simple
grammars, uncovering latent linguistic structures (e.g., verb
subcategorisation), and in doing so far out-performs a standard PCFG.
Yuval Krymolowski (University of Haifa)
Wednesday, January 21, 2009
Automatic Annotation of Morpho-Syntactic Dependencies in a Modern Hebrew Treebank
Morpho-syntactic dependencies between sentence constituents are an
inseparable part of syntactic analysis. In Semitic languages, where the
order of certain constituents is relatively free, morpho-syntactic
agreement features are sometimes the main clue for computational
parsing models (Tsarfaty and Sima'an 2007, 2008). We present a
rule-based method for automatically adding dependency annotations to
the Modern Hebrew Treebank (MHT, Sima'an et al. 2001). We concentrate
on mother-daughter dependencies, in which the morphological features of
one or more daughter nodes affect the morphological features or
syntactic analysis of the mother node. The annotation scheme is used
for two purposes: i) annotating the mother-daughter dependencies
between nodes in the treebank, and ii) using the generated dependencies
for annotating morpho-syntactic features of compound constituents.
Manual evaluation shows that the development of the dependency scheme
and its automatic implementation were accurate and also proved helpful
in improving the quality of the manual annotation. We expect our
methodology for automatic dependency annotation of the MHT to be
generally applicable to other "Penn-compatible" syntactic resources
like the Arabic Treebank.
Joint work with Adi Milea and Yoad Winter
Jakub Szymanik (ILLC)
Wednesday, January 14, 2009
Complexity of Quantifiers
I will survey computational complexity of quantifier constructions in natural language. The talk will be divided into two parts.
In the first part I will describe automata-theoretic model of
processing simple (monadic) quantifiers in natural language (van
Benthem 1986). In particular, I will present recent empirical data
indicating plausibility of this model. In the experiments we compared
time needed for understanding different types of quantifier sentences.
We show that the distinction between quantifiers recognized by
finite-automata, e.g., "at least 5", and push-down automata, e.g.,
"most", is psychologically relevant. Moreover, our experiment indicates
that the comprehension of proportional quantifiers, like "more than
half", involves working memory resources. Additionally, we report
differences in comprehension of Aristotelian and cardinal quantifiers
as well as downward and upward monotone quantifiers.
In the second part of my talk I will have a look at complex (polyadic)
quantifiers in natural language. I will start with basic operations
creating complex construction from simple quantifiers: boolean
combinations, iteration, cumulation, and resumption. We will show that
the class of feasible (PTIME computable) quantifiers is closed under
those operations. Next we discuss NP-completeness of branching
quantifiers. Finally, we will investigate various readings of
quantified reciprocal sentences in English (see Dalrymple et al. 1998).
We show a dichotomy between those readings: the strong reciprocal
reading can create intractable (NP-complete) constructions, while the
weak and the intermediate reciprocal readings cannot.
Gideon Borensztajn (ILLC)
Tuesday, December 2, 2008
The Hierarchical Prediction Network, or is the end of symbolic parsing in sight? ;-)
I'll introduce the Hierarchical Prediction Network (HPN). HPN is a
connectionist network that has been implemented as a syntactical parser
and (semi-)unsupervised grammar inducer. Above all HPN is a model of
cortical computation, and it offers a biologically inspired approach to
the categorization process that is at the core of cortical information
processing. It incorporates key ideas from cognition, such as
integration between bottom-up and top-down processing through
prediction, hypothesis formation and testing, and hierarchically
structured representations, with progressively increasing invariance
and temporal compression.
In this presentation I'll demonstrate the operation of HPN as a
syntactic parser, and I'll show that HPN is able to parse any sentence
generated by a context-free grammar.
Vanessa Ferdinand (UvA)
Wednesday, November 12, 2008
How learning biases and cultural transmission structure information: iterated learning in human subjects and bayesian agents
Federico Sangati (ILLC)
Wednesday, October 22, 2008
Unsupervised Methods for Head Assignments
I will present several algorithms for assigning heads in phrase structure trees, based on different linguistic intuitions
on the role of heads in natural language syntax. The starting point of the approach is the observation that a head-annotated
treebank defines a unique lexicalized tree substitution grammar. This allows us to go back and forth between the two
representations, and define objective functions for the unsupervised learning of head assignments in terms of features of
the implicit lexicalized tree grammars. We evaluate algorithms based on the match with gold standard head-annotations, and
the comparative parsing accuracy of the lexicalized grammars they give rise to.
Floris Roelofsen (ILLC)
Thursday, September 18, 2008
Anaphora Resolved. A unified theory of pronouns, NP anaphora, and VP ellipsis.
This talk presents one of the main ideas proposed in Floris Roelofsen's dissertation,
which will be defended on the 9th of October. Anyone interested is invited to
attend the defense.
More info: http://student.science.uva.nl/~froelofs/defense/
Menno van Zaanen (Tilburg University)
Wednesday, September 10, 2008
Generic, Symbolic Sequence Classification
In the field of machine learning, many problems are treated as
classification tasks. A classifier takes an event and assigns a
(pre-defined) class to it. When events are sequences, the
classification may be based on aspects of the structure of these
events. For instance, the fact that certain symbols co-occur in a
sequence may be an indication that these sequences belong to a certain
class.
In this talk, I will describe some research I have been doing in this
field recently. This work is still in progress. I will show some
results on the task of question classification (assign the type of
answer to a question), and composer classification (given a musical
piece, assign its composer).
Reut Tsarfaty (ILLC)
Wednesday, August 13, 2008
Relational-Realizational Parsing
State-of-the-art statistical parsing models applied to free word-order
languages tend to underperform compared to, e.g., parsing English.
Constituency-based models often fail to capture generalizations that
cannot be stated in structural terms, and dependency-based models employ
a ‘single-head’ assumption that often breaks in the face of multiple
exponence. In this paper we suggest that the position of a constituent
is a form manifestation of its grammatical function, one among various
possible means of realization. We develop the Relational-Realizational
approach to parsing in which we untangle the projection of grammatical
functions and their means of realization to allow for phrase-structure
variability and morphological-syntactic interaction. We empirically
demonstrate the application of our approach to parsing Modern Hebrew,
obtaining 7% error reduction from previously reported results.
Reut Tsarfaty and Khalil Sima'an (in press). Relational-Realizational Parsing. In: Proceedings of The 22nd International Conference on Computational Linguistics (CoLing).
Stefan Frank
(ILLC)
Wednesday, June 4, 2008
Resolving ambiguous
pronouns: a psycholinguistic model
In sentences of the form Bob
lied to Joe because he could not handle the truth, the
pronoun he
is ambiguous: It can refer to either Bob or Joe.
Resolving this ambiguity requires the application of world knowledge
about the causal relation between "lying" and "being able to handle the
truth". However, the pronoun may also be (partially) disambiguated
because of a bias for one reading over the other. For example, the fact
that Bob
is mentioned first may lead to the interpretation in which Bob could not handle
the truth.
I will present a computational model (Frank et al., 2007) that
simulates how human readers resolve such ambiguities. The model is
based on three assumptions. First, any biasing effect takes place
before world knowledge comes into play. Second, ambiguity resolution is
a side-effect of establishing causal coherence in text (cf. Hobbs,
1979; Kehler, 2002). Three, coherence is increased by manipulating
mental representations of the situations described by the text. These
representations are more like "mental models" (Johnson-Laird, 1983)
than like the symbol structures known from formal semantics.
In the model, possible interpretations of an ambiguity are represented
by centers of gravity in a high-dimensional space. The unresolved
ambiguity forms a vector in the same space. This vector is attracted by
the centers of gravity, while also being affected by context
information and world knowledge. When the vector reaches one of the
centers of gravity, the ambiguity is resolved to the corresponding
interpretation. The model accounts for a considerable amount of
reading-time and error-rate data and explains, among others, the
effects of context informativeness and anaphor type.
Frank, S.L., Koppen, M., Noordman, L.G.M., & Vonk, W. (2007). Coherence-driven resolution of referential
ambiguity: a computational model. Memory & Cognition, 35,
1307-1322.
Michiel van Lambalgen (ILLC)
Wednesday, May 21, 2008
Computational semantics,
brain and behaviour
My interest in computational semantics is driven by the need to find
processing models for language comprehension and production that are on
the one hand abstract enough to connect to formal semantics, on the
other hand concrete to the extent that they allow predictions for brain
imaging experiments and for studies in developmental language
disorders. The framework that I developed in joint work with Fritz Hamm
(The proper treatment of
events, Blackwell 2004) satisfies these desiderata.
I will present two applications: EEG measurements of the processing of
the 'past progressive', and deviant uses of tense in children with
ADHD.
Tejaswini Deoskar
(Cornell University)
Friday, May 23, 2008
Unsupervised
re-estimation of probabilistic lexicons for treebank PCFGs
Statistical models of syntactic structure used for natural language
parsing have different ways of representing the properties of
individual
lexical items which determine their associated local and non-local
syntactic structure. In all cases, probabilities associated with a
majority of open-class lexical items are not represented accurately in
models trained solely on labeled (treebank) data, due to the scarcity
of
labeled data. In this talk, I present procedures which re-estimate the
lexical parameters of a treebank PCFG from unlabeled data using the
Inside-Outside algorithm, and pool the re-estimated lexical information
with lexical information from the treebank PCFG. The procedures produce
substantial improvements on the task of determining sub-categorization
preferences of novel and low-frequency verbs, relative to a smoothed
Penn Treebank PCFG. In addition, I also present a methodology to built
an enhanced Penn Treebank PCFG containing lexically-oriented features,
which is used as the prior model for the inside-outside procedure.
Cristina Barés
Gómez
(Institute of Islamic and Near East Studies, Spanish Nationa Research
Council, Zaragoza; and Philosophy, Logic, and Philosophy of Sciences
Department, University of Sevilla, Spain)
Wednesday, May 14, 2008
Meaning in the automatic
interpretation process of ancient Northwest Semitic texts
We intend to offer an outlook of my PhD thesis project, therefore there
won’t be any definitive results presented. This work
is at
the present carried out in the framework of a wider project that tries
to implement the interpretation process of Ugaritic, an ancient Semitic
language. My research is mainly intended as an approach to the semantic
level, but taking also into account all the levels of the language
studied by the philologist.
Michael
Klein (Radboud University Nijmegen)
Wednesday, April 2, 2008
Computational Modelling
of Meaning Processing in the Brain
This talk is about the biologically inspired modelling of meaning
processing in the brain. I will start off with two strands of
modelling: one that deals with the acquisition and representation of
basic concepts by cortical learning algorithms; the other uses value
functions over situations and internal models to accomplish
goal-directed utterance selection. Fusing these two approaches into one
theoretical framework, I will show
how conceptual representations in the cortex are likely to be used (and
also changed)
during information transmission with very simple utterances. The
framework also
introduces neural mechanisms dealing with information structure
(topic/comment) and
with basic (sort of Fregian) quantification, both on a very elementary
level.
Joakim Nivre (Växjö University and Uppsala
University, Sweden)
Wednesday, February 27, 2008
Inductive Dependency
Parsing of Natural Language Text
This talk summarizes my research on data-driven dependency parsing over
the last five years. To put this work into context, I first discuss
what it means to parse a sentence in a text (as opposed to the more
well understood notion of parsing a sentence with a formal grammar) and
propose criteria for the evaluation of text parsers. I then go on to
describe my own dependency-based approach to text parsing, which I
characterize as "transition-based" (to distinguish it from the other
main tradition in data-driven dependency parsing, which I call
"graph-based"). In this approach, inference is performed as a greedy
best-first search over a non-deterministic transition system, while
learning is reduced to the simple classification problem of mapping
each parser state to the correct transition out of that state. I also
discuss methods for handling non-projective dependencies (i.e.,
discontinuous constructions), in particular the widely used
pseudo-projective parsing technique, which allows non-projective
dependencies to be recovered using a strictly projective parsing
algorithm. I conclude with a quick survey of empirical results,
focusing on a contrastive error analysis of transition-based vs.
graph-based parsing based on data from the CoNLL 2006 shared task.
Jacqueline van Kampen (UiL OTS, Utrecht University)
Wednesday, February 20, 2008
(Modeling) the steps of
early syntax acquisition
The abstract can be downloaded here
Philipp Koehn (School of Informatics, University of Edinburgh)
Wednesday, December 12, 2007
Linguistic Problems for
Statistical Machine Translation
Machine translation is more relevant than ever,
especially
in a European Union with 23 official languages. What will
happen to languages such as Dutch? Will it survive as
a language of commerce, or will it be abandoned in favour
of English? By lowering translation costs, we would expect
to systain the viability of a commercial zone that uses
so many different languages.
Statistical machine translation holds the promise of instant
machine translation. Given open source tools such the Moses
decoder, just add a parallel corpus and you have a machine
translation system.
This talk will present some problems where the standard
phrase-based approach fails, and where attention to the
specifics of the languages involved is required. I will present
methods that deal with different word order, morphology and
agglutinative compounding.
Stefan Frank (ILLC)
Wednesday, January 30, 2008
Predicting reading times through language models and world models
According to 'surprisal theory' (Hale, 2001; Levy, in press), the time
needed to read a word in a sentence is proportional to the negative
logarithm of its probability given the preceding word string. Under
this theory, any model that assigns probabilities to sentence-initial
substrings also generates predictions of word-by-word reading times.
The cognitive plausibility of the model can then be assessed by
comparing its predictions to experimentally obtained reading-time data.
I'll discuss how such an approach may tell us something about the
language constructions used in human sentence processing.
Alternatively, word-string probabilities can follow from a world model
rather than a language model. This requires a probability distribution
over possible worlds and a mapping from sentence-initial substrings
onto the corresponding set of possible worlds. There exists a
sentence-comprehension model (Frank, Haselager, & Van Rooij,
submitted) that provides exactly those two things, albeit only with
respect to a 'microworld' and a 'microlanguage'. I'll show some
examples of reading-time predictions generated by this model, and
explain how they seem to correspond to recent experimental findings.
Rens Bod (ILLC)
Wednesday, January 23, 2008
Markos Mylonakis (ILLC)
Wednesday, October 31, 2007
Unsupervised Estimation
for Noisy-Channel Models
Shannon's Noisy-Channel model, which describes how a
corrupted message
might be reconstructed, has been the corner stone for much work in
statistical language and speech processing. The model factors into two
components: a language model to characterize the original message and a
channel model to describe the channel's corruptive process. The
standard
approach for estimating the parameters of the channel model is
unsupervised Maximum-Likelihood of the observation data, usually
approximated using the
Expectation-Maximization (EM) algorithm. In this paper we show that it
is
better to maximize the joint likelihood of the data at both ends of the
noisy-channel. We derive a corresponding bi-directional EM algorithm
and
show that it gives better performance than standard EM on two tasks:
(1)
translation using a probabilistic lexicon and (2) adaptation of a
part-of-speech tagger between related languages.
Henk Zeevat (ILLC, joint work with Uwe Reyle)
Wednesday, October 24, 2007
Semantic Grammar
Frege invented categorial grammar in correspondence with
Husserl, but unlike
later developments (Ajdukiewicz, Geach, Lambek) the aim was not to
characterise the set of well-formed strings but rather to define
semantic
saturation: words need to combine with each other to give saturated
meanings
in type t or e.
Building on work by Kamp on presupposition and by Reyle
on
bottom up
drs-induction, we developed a version of Frege's grammar in which
concepts
are taken as primitive and where concepts need information from their
context
in order to be saturated. In our conception, saturation is defined
by resolvedness: all the contextual binding (including binding from
other
concepts given by words in the same utterance) needs to have been
accomplished.
The resulting grammar and the parsing routine naturally
suffers from
overgeneration. But it can learn and in fact it seems very suitable as
a
learning grammar and quite suitable for combination with stochastic
considerations. I will end by discussing the following theses:
1. Learning decreases overgeneration.
1'. In \omega, there is no overgeneration.
2. Learning increases the plausibility of the readings found.
2'. In \omega, it will find the intended interpretation.
3. The grammar is prelinguistic and prehuman.
3'. It offers a natural framework for studying and modelling L1
acquisition
3''. It offers a natural framework for studying language change and
evolution
Reut Tsarfaty (ILLC)
Wednesday, June 13, 2007
Three-dimensional
parametrization for parsing morphologically rich
languages
Current parameters of accurate unlexicalized parsers
based on
Probabilistic Context-Free Grammars (PCFGs) form a two-dimensional
grid in which rewrite events are conditioned on both horizontal
(head-outward) and vertical (parental) histories. In Semitic
languages, where arguments may move around rather freely and
phrase-structures are often shallow, there are additional morphological
factors that govern the generation process. Here we propose that
agreement features percolated up the parse-tree form a third dimension
of parametrization that is orthogonal to the previous two. This
dimension differs from mere state-splits as it applies to a whole set
of categories rather than to individual ones and encodes
linguistically motivated co-occurrences between them. This paper
presents extensive experiments with extensions of unlexicalized PCFGs
for parsing Modern Hebrew in which tuning the parameters in three
dimensions gradually leads to improved performance. Our best result
introduces a new, stronger, lower bound on the performance of treebank
grammars for parsing Modern Hebrew, and is on a par with current
results for parsing Modern Standard Arabic obtained by a fully
lexicalized parser trained on a much larger treebank.
Reinhard Blutner (ILLC)
Wednesday, May 30, 2007
Quantum probabilities,
entanglement, and computational
semantics
"With four parameters I can fit an elephant, and with
five I can make him wiggle his trunk" (John von Neumann)
Abstract Classical truth-functional semantics and almost
all of its modifications have a serious problem in treating prototypes
and their combination. Though some modelling variants can fit many of
the puzzling empirical observation, their explanatory value is seldom
noteworthy. I will argue that the explanatory inadequacy is due to the
Boolean characteristic of the underlying semantics, which only allows
mixing possible words but it excludes the idea of superposition crucial
for geometrical models of meanings. In the first part of the talk I
will motivate the "virtual conceptual necessity" of quantum
probabilities for the proper analysis of concepts and their
combination. The second part introduces quantum probabilities, and the
third part starts to discuss a still open list of possible applications
in formal and computational semantics.
Louis ten Bosch & Lou Boves (Radboud University
Nijmegen)
Wednesday, May 23, 2007
Acquisition of Recognition and Communication Skills
(ACORNS)
The topic of this presentation is
ACORNS, an FP6 FET project on the acquisition of communication and
recognition skills. The goal of ACORNS is to develop a
memory-prediction model that will demonstrate the capability to
acquire language and communication skills on the basis of rich sensory
input. The input will comprise not only speech in the context of
references to objects and concepts in the physical environment, but
also feedback from other agents in the environment on the actions that
the learning agent takes in response to the inputs, guided by an
innate need to communicate. We aim at building a computational model
of an agent that learns to communicate. To that end we will use
methods for representing acoustic signals, for detecting and storing
meaningful and robust patterns in these representations, guided by
purposeful interaction. The presentation will go into detail with
respect to the mathematical/computational and cognitive aspects of the
model.
Pieter Adriaans (Informatics Institute, UvA)
Wednesday, May 9, 2007
The Power and Perils of MDL
In this lecture I will present some recent
work I did with Paul Vitanyi and Ceriel Jacobs on the application of
the MDL (Minimum Description Length) principle to grammar induction. We
have studied MDL in terms of two-part code optimization and randomness
deficiency. These notions will be explained in the lecture. In this
framework we showed that:
- Shorter code not necessarily leads to better
theories, e.g. the randomness deficiency does not decrease
monotonically with the MDL code,
- contrary to what is suggested by the results of
Gold:1967 there is no fundamental difference between positive and
negative data from an MDL perspective,
- MDL is extremely sensitive to the correct
calculation of code length.
Using these ideas we have implemented a MDL variant of the EDSM
algorithm. The results show that although MDL works well as a global
optimization criterion, it falls short of the performance of algorithms
that evaluate local features of the problem space. MDL can be described
as a global strategy for featureless learning.
Paul Boersma
(Phonetic Sciences, University of Amsterdam; joint work with Silke
Hamann)
Wednesday, March 21, 2007
The evolution of auditory contrast
This paper reconciles the standpoint that language users do not aim at
improving their sound systems with the observation that languages seem
to improve their sound systems. Computer simulations of sibilant
inventories show that Optimality-Theoretic learners who optimize their
perception grammars automatically introduce a so-called prototype
effect, i.e. the phenomenon that the learner's preferred auditory
realization of a certain phonological category is more peripheral than
the average auditory realization of this category in her language
environment. In production, however, this prototype effect is
counteracted by an articulatory effect that limits the auditory form to
something that is not too difficult to pronounce. If the prototype
effect and the articulatory effect are of a different size, the learner
must end up with an auditorily different sound system from that of her
language environment. The computer simulations show that, independently
of the initial auditory sound system, a stable equilibrium is reached
within a small number of generations. In this stable state, the
dispersion of the sibilants of the language strikes an optimal balance
between articulatory ease and auditory contrast. This result has been
derived from a model without goal-oriented elements.
Reut Tsarfaty (ILLC)
Monday, December 11, 2006
Kamvar, Klein & Manning's (2002) paper on probabilistic
interpretations of classical clustering algorithms.
Jelle Zuidema (ILLC)
Wednesday, Novermber 29, 2006
Estimators for Data-Oriented Parsings
We need theoretical as
well as empirical criteria for evaluating learning algorithms for
natural language. In existing work, the concepts bias and
(in)consistency from statistical estimation theory have been applied
to evaluate methods for stochastic tree substitution grammars (STSGs),
such as used in Data-Oriented Parsing (DOP). We extend this work by
showing that all published estimators for DOP are inconsistent in the
usual sense. However, we further show that -- contrary to received
wisdom -- neither lack of bias, nor consistency on the full class of
STSGs are necessary conditions for the adequacy of learning methods.
Instead, we propose an evaluation scheme that uses linguistic criteria
based on the concept ``consistency class''. We show that most
existing methods for STSGs also fail this new test, and argue that our
scheme is applicable more widely to machine learning techniques for
natural language.
Hartmut Fitz (ILLC)
Wednesday, October 18, 2006
PDP model of complex sentence production
I present a neural-symbolic learning model of sentence production
which is trained on a structurally complex language built from simple
clause constructions basic to human experience. I investigate the
model's learning behavior, its ability to map familiar constituents to
novel roles, and its ability to generalize constructions into novel
sentence positions.
Joachim de Beule (AI-Lab, VU Brussels)
Wednesday, September 27, 2006
Fluid Construction Grammar
David Ahn (ILPS)
Wednesday, September 13, 2006
Stages of event extraction
Event detection and recognition is a complex task consisting of
multiple sub-tasks of varying difficulty. In this talk, I present a
simple, modular approach to event extraction that allows us to
experiment with a variety of machine learning methods for these
sub-tasks, as well as to evaluate the impact on performance these
sub-tasks have on the overall task.
Reut Tsarfaty
Monday, July 3, 2006
Hebrew Statistical Parsing
Current parsing models are not immediately applicable for
languages
that exhibit strong interaction between morphology and syntax, e.g.,
Modern Hebrew (MH), Arabic and other Semitic languages. This talk
presents a first attempt at modeling morphological-syntactic
interaction in a generative probabilistic framework to allow for MH
parsing. We argue that morphological information is essential for
parsing MH, and show how morphological cues improve the syntactic
disambiguation capabilities of an integrated model. Following a
detailed discussion of the linguistic data that motivates our
integrated approach, I present the formal setting and baseline results
of our integrated model(s) and then discuss more sophisticated
versions by means of which we hope to boost parsing accuracy and
improve morphological disambiguation.
Peter Grunwald (CWI)
Thursay, June 29, 2006
Introduction to Modern Minimum Description Length
Methods
The Minimum Description Length (MDL) Principle is an
information-theoretic method for statistical inference. It is
particularly suited to deal with models of arbitary complexity. In
recent years, researchers have made significant theoretical advances
concerning MDL. In this talk we aim to present these results and their
applications to a wider audience. In its modern guise, MDL is based on
the concept of a `universal model'. We explain this concept at length.
We show that previous versions of MDL (based on so-called two-part
codes), Bayesian model selection and predictive validation (a form of
cross-validation) can all be interpreted as approximations to model
selection based on `universal models'. Modern MDL prescribes the use
of a certain `optimal' universal model, the so-called `normalized
maximum likelihood model'. This is related to (yet different from)
Bayesian model selection with non-informative priors. It leads to a
penalization of `complex' models that can be given an intuitive
geometric interpretation. Roughly speaking, the complexity of a
parametric model is directly related to the number of distinguishable
probability distributions that it contains.
Antal van den Bosch (Tilburg University)
Wednesday March 22, 2006
Implicit computational linguistics
Much of computational linguistics, present and past, has borrowed its
existence from explicit linguistic abstractions such as parts of
speech, constituents, and dependencies. The currently dominant data-
driven paradigm, for example, is largely based on inducing models from
lexicons and corpora in which these abstractions have been annotated.
The fact that virtually all of these abstractions have been questioned
and are still fiercely debated in linguistics appears to not have
disturbed many in CL. In this talk I aim to show that there is reason
to worry and to wonder. First, I review a case of "harmful" explicit
modularization in computational models of word pronunciation. Second, I
present results from experiments in shallow parsing that show the
relative weakness of parts-of-speech as intermediary symbols. Third, I
attempt to exemplify the type of implicit computational linguistics
research that in my view is an authentic attempt at adopting the
greatest protesters against frozen explicit abstractions in linguistics
such as Firth, Harris, and Croft, using computational means.
Ton van der Wouden (Leiden, University)
Wednesday, March 15, 2006
Dutch as a Construction Language
We report on ongoing research into the building blocks of spoken
language. Strictly decompositional theories about the architecture of
the language faculty, with a computational system (grammar) combining
elements from a list of words (lexicon), can hardly account for the
fact that actual language use is full of recurrent word combinations
of various degrees of idiomaticity and abstractness. Alternative
theories, such as Construction Grammar and Construction based HPSG,
are claimed to do better in this respect. In our talk, we will discuss
our methods to isolate recurrent word combinations interesting from
the Spoken Dutch Corpus (CGN). We will present first results, and we
will reflect on grammatical frameworks to describe them. Time
permitting, we will also touch upon issues of implementation.
Rens Bod (ILLC and Computer Science, St Andrews)
Wednesday, March 8, 2006
Unsupervised Data-Oriented Parsing
How can a corpus-based parsing model assign parse trees to sentences
if there are no trees in the corpus to begin with? During the last few
years there has been considerable progress in unsupervised induction
of trees. The most successful unsupervised models come close to the
performance of a binarized supervised PCFG on WSJ sentences <=
10
words. This talk shows that we can get even closer to the performance
of supervised parsing by applying an "all-subtrees" approach to
unsupervised learning. Our approach initially assigns all possible
binary trees to a set of given sentences and next uses all subtrees
from a subset of these binary trees to compute the most probable parse
trees. We show how this model can be implemented by a PCFG-reduction
technique and report competitive results on English, German and
Chinese data. We argue that previous approaches to tree induction are
limited in that they do not take into account structural context
and/or non-contiguous substrings.
Remko Scha (ILLC)
Wednesday, February 8, 2006
Data-Oriented Semantics
In this talk I will assume that the audience is at least superficially
familiar with the approach to exemplar-based language processing which
is
known as "Data-Oriented Parsing" (DOP). So far, the models which
instantiate the DOP approach tend to deal exclusively with the
syntactic
aspects of language processing. The purpose of the talk is to look at
the
prospects of generalizing this work toward the development of
data-oriented models of semantics. To make progress in this direction,
two
different research agendas may be pursued now:
- Exemplar-based models of concept-formation -- not
only
for lexical concepts but also for the operations of
"compositional semantics". Work by Renate Bartsch may be a
useful starting point here.
- Separating the syntactic from the semantic component
in our models of sentence probabilities. To the extent
that we can use corpora which are annotated semantically
as well as syntactically, we may construct Bayesian models
which assign distinct probabilities to meanings and to
meaning-syntax mappings. At a technical level, there are
useful analogies with models for Data-Oriented
Translation.
Both lines of thought are directly relevant for the problem of language
acquisition, which sooner or later must be faced by all models of
linguistic cognition: How are complex syntactic and semantic structures
gradually bootstrapped by a system which is merely exposed to concrete
real-world situations with accompanying noises?
Jelle Zuidema
(ILLC)
Wednesday, November 23, 2005
Data-Oriented Language Learning - from weights to frequencies and back
again
Stochastic Tree Substitution Grammars (STSGs), such as used
in Data-Oriented Parsing, have great linguistic advantages,
essentially merging "construction grammar" with "probabilistic
linguistics". However, from a computational linguistics perspective,
they pose a number of computational challenges that have not yet been
satisfactorily solved. Two fundamental and related problems are "the
problem of estimation" -- estimating the weights of an STSG from
observed subtree frequencies in a tree bank -- and "the problem of
expectation" -- calculating the expected subtree frequencies when
generating trees using an STSG with known weights. A linguistic
desideratum for estimation is that it converges to the maximally
general STSG out of the possibly many correct ones. I will briefly
discuss why none of the existing estimation methods fulfills this
desideratum. I will then present my recent work on the problem of
expectation and discuss how its solution directly suggests an
alternative approach to the first problem.
Yoav Seginer (ILLC)
Wednesday, November 9, 2005
Induction of a Dependency Parser
In this talk I describe an unsupervised learning algorithm
for the
induction of an incremental dependency parser from raw text.
The parser and learning algorithm work in tandem to bootstrap
the parser - as an utterance is read from left to right, the parser
incrementally assigns it a dependency structure based on parameters
learned from previous examples. Simultaneously, the learning algorithm
uses the resulting parse to improve its estimation of additional
parameters.
The parser and learning algorithm were designed for and tested on the
adult utterances in the Childes corpus. The input is therefore spoken
language which is usually syntactically simple, with a limited
vocabulary and an extensive use of pronouns. At the same time, the
corpus contains a balanced mixture of declaratives, imperatives and
questions together with many fragments and interjections. Many of the
utterances are incomplete, ungrammatical or marginally grammatical.
The algorithm must therefore be resistant to noise and able to assign
a dependency structure also to utterances which are not entirely
grammatical.
I will conclude by discussing the evaluation of such a parser.
Comparison
with a gold standard is known to be problematic in the context of
unsupervised learning and may be entirely impossible when no relevant
annotated corpus is available (as in the present case). I will argue
that
one criterion for the success of an unsupervised learning algorithm is
the stability of its output when trained on different corpora
of the same language. This should then be controlled for triviality by
its failure to generate the same results when trained on various
corpora
not taken from that language (but with the same set of words).
Henk Zeevat (ILLC)
Wednesday, October 19, 2005
A New Implementation of Optimality Theory?
Typical of existing implementations is a limitation to a
maximum number of errors or their non-existence (syntax, pragmatics).
The talk
discusses work in progress on an implementation scheme in which errors
are not computed and in which economy principles follow from the
general scheme.
The set of candidates for an input is represented by an underspecified
representation containing the input and the constraints by routines
that try to add information to the underspecified structure in a
strictly monotonic way
(compare default unification). Errors are just the cases where the
constraint cannot add information because conflicting information is
already present or the information is ruled out by other factors. By
applying the constraints in the order of their strength the optimal
candidate is constructed.
The talk will present a program for a standard OT application:
syllabification (using not unification by CLP) and two schemes for
implementing syntax, including a reconstruction of Bresnan's optimal
syntax by recasting the constraints as specification routines.
Detlef Prescher
Wednesday, September 28, 2005
Head-Driven PCFGs with Latent-Head Statistics
Although state-of-the-art parsers for natural language are lexicalized,
it
was recently shown that an accurate unlexicalized parser for the Penn
tree-bank can be simply read off a manually refined tree-bank. While
lexicalized parsers often suffer from sparse data, manual mark-up is
costly and largely based on individual linguistic intuition. Thus,
across
domains, languages, and tree-bank annotations, a fundamental question
arises: Is it possible to automatically induce an accurate parser from
a
tree-bank without resorting to full lexicalization? In this paper, we
show
how to induce head-driven probabilistic parsers with latent heads from
a
tree-bank. Our automatically trained parser has a performance of 85.7%
(LP/LR F1), which is already better than that of early lexicalized
ones.