Neural language models with latent syntax
Daan van Stigt

Abstract:
We study probabilistic neural language models that are obtained by marginalizing a structured latent variable in the recurrent neural network grammar (RNNG), a joint model of sentences and their syntactic structure. Supervised RNNGs produce competitive language models that are sensitive to syntactic phenomena. However, they require annotated data and the intractable sum over the structured variable complicates straightforward learning from data in which that structure is latent. Approximate learning with variational inference provides a solution, enabling estimation from unannotated data, but the effectiveness of this approach will crucially depend on the quality of the approximate posterior. In this thesis we take a first step in this direction. We introduce a neural conditional random field (CRF) constituency parser—which proves a competitive parser in its own right—and experiment with the CRF as approximate posterior in variational learning where the two models are jointly optimized to maximize a lower bound on the marginal log-likelihood. This opens the door to semisupervised and unsupervised learning. The CRF formulation of the parser allows the exact computation of key quantities in the lower bound, and the global normalization provides a robust distribution for the sampling based gradient estimation. Preliminary results with unlabeled trees suggest the potential of this approach for unsupervised n-ary tree induction, and we formulate future work towards this goal. Finally, to evaluate how the joint formulation differentiates the RNNG we perform targeted syntactic evaluation, and compare its performance with that of neural language models that are etimated using multitask learning with a syntactic side objective.