Please note that this newsitem has been archived, and may contain outdated information or links.
19 May 2026, Computational Linguistics Seminar, Sarath Sivaprasad
When large language models are deployed in real world with vast possible action spaces, what guides their choice of a single next action? In this talk we delve into the heuristics underlying LLM response sampling. Similar to human cognition, LLMs rely on two interacting components: a descriptive component that reflects the statistical distribution of possibilities, and a prescriptive component that reflects an implicit value weighted ideal. This dual structure also appears in how models represent prototypes mirroring human prototype theory and fast, system-1 like judgments. As a result, LLMs act as value optimizers, consistently shifting their samples toward high-value or idealized options. This can potentially explain their real-world behavior like being greedy explorers and value bias in how they pick options. We will discuss empirical evidence across concepts and model families, the mechanisms driving these biases, and the implications for reasoning, exploration, alignment, and safe deployment of value guided generative systems.
Please note that this newsitem has been archived, and may contain outdated information or links.