DESCRIPTION:Stochastic Tree Substitution Grammars
(STSGs), such as used in Data-Oriented Parsing, ha
ve great linguistic advantages, essentially mergin
g "construction grammar" with "probabilistic lingu
istics". However, from a computational linguistics
perspective, they pose a number of computational
challenges that have not yet been satisfactorily s
olved. Two fundamental and related problems are
"the problem of estimation" -- estimating the wei
ghts of an STSG from observed subtree frequencies
in a tree bank -- and "the problem of expectation"
-- calculating the expected subtree frequencies w
hen generating trees using an STSG with known weig
hts. A linguistic desideratum for estimation is th
at it converges to the maximally general STSG out
of the possibly many correct ones. I will briefly
discuss why none of the existing estimation method
s fulfills this desideratum. I will then present m
y recent work on the problem of expectation and di
scuss how its solution directly suggests an altern
ative approach to the first problem. For more i
nformation, see http://staff.science.uva.nl/~jzuid
ema/CLS/
