Categorical Foundations for Extended Compositional Distributional Models of Meaning
Gijs Wijnholds
Abstract:
Compositional distributional models of meaning were introduced by Coecke et al. (2010, 2013) with the aim of reconciling the theory of distributional meaning in terms of vector space semantics with the theory of compositional interpretation as one finds it in typelogical grammars. The particular typelogical formalisms employed by Coecke et al. (pregroup grammars, Lambek calculus) have a recognizing capacity equivalent to context-free grammars. It is well known, however, that natural languages exhibit patterns that require expressivity beyond context-free (Huybregts, 1984; Shieber, 1987). The aim of this thesis, then, is to investigate extensions of compositional distributional models of meaning that result from using typelogical grammars with enhanced expressivity. To this end, we give a categorical characterization of the Lambek-Grishin Calculus (see Moortgat (2007, 2009) and references there) and its constituting subsystems in terms of linear distributive categories borrowing a categorification technique from Lambek (1968). We develop a language to reason graphically about morphism structure and equality in terms of string diagrams. Finally, we show that finite-dimensional vector spaces are also an instance of linear distributive categories, which creates the possibility of extended compositional distributional models of meaning.