Code and Datasets
Code and datasets produced at ILLC are collected at several, external locations. Code and datasets registered in PURE can be found via the Searchable List of Research Output.
A Data-Oriented Parsing demo
This Data-Oriented Parsing demo constructs a syntactic analysis for a given sentence based on a model learned from a corpus of annotated sentences.
Data-Oriented Parsing is a framework developed at ILLC based on the notion that language use relies on the recombination of exemplars and fragments from memory (the model has been applied to other forms of cognitions such as music and reasoning as well). Based on the programmatic paper of Scha (1990), an implementation based on Tree-Substitution Grammar was proposed by Bod (1992). This demo shows the latest implementation, described in van Cranenburgh et al. (2016), and developed as part of the PhD research of the first author. The model in this demo supports multiple languages, grammatical function labels, and discontinuous constituents.
For more information, contact A.W.vanCranenburgh at uva.nl.
- Scha (1990). Language theory and language technology; competence and performance. Computertoepassingen in de Neerlandistiek. http://iaaa.nl/rs/LeerdamE.html
- Bod (1992). A computational model of language performance. Proc. of COLING. http://aclweb.org/anthology/C92-3126
- van Cranenburgh, Scha, Bod (2016). Data-Oriented Parsing with discontinuous constituents and function tags. Journal of Language Modelling. http://dx.doi.org/10.15398/jlm.v4i1.100