Language and Computation (LaCo)
Jelle Zuidema, Khalil Sima'an (deputy).
Kaspar Beelen, Rens Bod, Serge ter Braake, Elia Bruni, John Ashley Burgoyne, Tejaswini Deoskar, Raquel Fernandez Rovira, Wilker Ferreira Aziz, Henkjan Honing, Jaap Kamps, Amir Kamran, Bart Karstens, Phong Le, Diego Marcheggiani, Miguel Rios Gaona, Paula Roncaglia Denissen, Makiko Sadakata, Ivan Titov, Daniel Wiechmann, Henk Zeevat.
Samira Abnar, Sophie Arnoult, Joost Bastings, Joachim Daiber, Mostafa Dehghani, Anton Frolov, Raquel Garrido Alhama, Serhii Havrylov, Cuong Hoang, Dieuwke Hupkes, Hugo Huurdeman, Berit Janssen, Ehsan Khoddammohammadi, Corina Koolen, Emma Mojet, Michael Schlichtkrull, Philip Schulz, Milos Stanojevic, Marco Del Tredici, Carlos Vaquero Patricio, Joey Weidema, Bastiaan van der Weij.
Bushra Jawaid, Filip Klubicka.
This project is concerned with computational models of human information processing, especially natural language processing and music perception. The methods employed in this project build on formal theories of linguistic syntax and logical semantics, but extend these with a variety of more performance-oriented techniques, such as probabilistic grammars and computational models of human Gestalt perception. The project aims to develop computational methods which are cognitively plausible as well as practically useful.
An important focus is the further development of corpus-based processing methods for natural language, building on the `Data-Oriented Parsing' model which we developed over the last fifteen years. Blind tests on annotated corpora have shown that existing implementations of this model are very successful in computing simple syntactic surface structures. Current research involves improving the probability estimations of the model, enriching its linguistic coverage, and putting semantic representations into the picture. This enables us to move toward models of first language acquisition and language change. We also pursue practical applications, such as Statistical Machine Translation.
Another important application area is concerned with Information Retrieval and Question Answering. In this area, we employ the state-of-the-art retrieval techniques, and focus on improving the practical usefulness of these systems through innovations in user-interfaces and cognitive ergonomics.
In cooperation with the project Logic and Language, we develop models of linguistic processes at the level of pragmatics and discourse. Here we employ the framework of Optimality Theory to articulate fairly complex models as hierarchies of competing constraints.
Our research on music cognition focuses on an aspect of music which is fundamental but ill-understood: the perception of the temporal aspects of music, such as rhythm, tempo and timing. We develop computational models which implement mathematically articulated theories, and which are validated through psychological experiments with human listeners. The models we develop here can be applied in algorithms for automatic transcription, automatic accompaniment and music generation.
Language research and music research deal with significantly different domains; they cannot be expected to use exactly the same concepts, tools, and techniques. But language and music do have important features in common: they are both sign systems evolved in human society, which rely on the human ability to perceive complex hierarchical structures in linear sequences. We believe it is useful, therefore, to explore these two domains jointly. Some convergence can be observed already. To begin with, formal theories of musical Gestalt perception were an important source of inspiration for the initial development of the Data-Oriented Parsing model around 1990. Now the influence starts to go in the other direction: Bayesian statistics (one of the core techniques in probabilistic language processing) turns out to very useful in music cognition as well. Also, recent research yields increasing evidence for a `memory-based' component in music perception; our research on Data-Oriented Parsing will be a useful reference-point when we start to try to model this phenomenon.