Please note that this newsitem has been archived, and may contain outdated information or links.
23 November 2005, Computational Linguistics Seminar, Jelle Zuidema
Stochastic Tree Substitution Grammars (STSGs), such as used in Data-Oriented Parsing, have great linguistic advantages, essentially merging "construction grammar" with "probabilistic linguistics". However, from a computational linguistics perspective, they pose a number of computational challenges that have not yet been satisfactorily solved.
Two fundamental and related problems are "the problem of estimation" -- estimating the weights of an STSG from observed subtree frequencies in a tree bank -- and "the problem of expectation" -- calculating the expected subtree frequencies when generating trees using an STSG with known weights. A linguistic desideratum for estimation is that it converges to the maximally general STSG out of the possibly many correct ones. I will briefly discuss why none of the existing estimation methods fulfills this desideratum. I will then present my recent work on the problem of expectation and discuss how its solution directly suggests an alternative approach to the first problem.
For more information, see http://staff.science.uva.nl/~jzuidema/CLS/
Please note that this newsitem has been archived, and may contain outdated information or links.