Universiteit van Amsterdam

Events

Institute for Logic, Language and Computation

Please note that this newsitem has been archived, and may contain outdated information or links.

9 March 2001, Computational Logic Seminar, Djoerd Hemstra

9 March 2001, Computational Logic Seminar, Djoerd Hemstra
Speaker: Djoerd Hiemstra (UTwente) Title: Statistical Language Models for Information Retrieval
Date and Time: March 9, 2001, 13.30
Location: Room P.327, ILLC, Plantage Muidergracht 24, Amsterdam

Abstract:
Information Retrieval (IR) probably was the first area of natural language processing in which statistics were successfully applied. Two models of ranked retrieval developed in the late 60s and early 70s are still in use today: Salton's vector space model and Robertson/Sparck-Jones' probabilistic model. However, the real breakthrough of statistical models in natural language processing did not come from the IR community, but from the speech recognition community in the 70s and 80s. Many of the statistical techniques that were first successfully applied for speech, like Shannon's noisy channel model, n-gram models and hidden Markov models are used today in all sorts of applications, like e.g. part-of-speech tagging, optical character recognition, statistical translation, stochastic context free grammars, etc.

In this talk I will show that statistical language models originally developed for speech can be used to model ranked retrieval as well. The application to IR has characteristics of both the vector space model and the probabilistic model of IR and gives a probabilistic interpretation of:

  • tf.idf term weighting
  • relevance weighting of query terms
  • Boolean-structured queries

The model can easily be extended with additional statistical processes, like for instance statistical translation to model cross-language information retrieval, i.e. to search for documents in a language other than the query.

For more information, see http://www.illc.uva.nl/~mdr/ACLG/Local/seminar01-1.html.

Please note that this newsitem has been archived, and may contain outdated information or links.