Investigations into Semantic Underspecification in Language Models
Franciscus Cornelis Lambertus Wildenburg

Abstract:
Several (position) papers have drawn attention to the challenges semantic underspecification may bring to modern language models, yet relatively little research has been done on this topic. We contribute to this area of research by presenting DUST, a dataset of underspecified sentences annotated with their domain of underspecification. Using this dataset and three experiments using prompts, language model perplexity, and diagnostic classifiers, we study the way modern language models process sentences containing semantic underspecification. We find that the ability of language models to recognize underspecification does not correlate with some commonly used metrics for language models, and that a fine-grained approach to underspecification could greatly benefit the research community.