Incorporating Structure into Neural Models for Language Processing Michael Sejr Schlichtkrull Abstract: Structured data is abundant in the world, as is the multitude of NLP applications seeking to perform inferences over such data. Despite their success, modern neural network models often struggle to incorporate structured information. In this thesis, we investigate how to build effective neural network models to incorporate structured data for natural language understanding. Graphs are a natural form of representation for structural information, and the recently proposed Graph Neural Networks (GNNs) allow neural networks to perform inference over graphs through learnable message passing functions. We begin by introducing effectively the first GNN model suitable for the directed, multirelational data found in common forms of structured data relevant to NLP applications, such as knowledge bases (KBs). We study structural encoders for relational link prediction, question answering, and fact verification. A significant challenge is the uninterpretable, black-box nature of such encoders. To alleviate this problem, we introduce a novel technique for interpreting the predictions of GNNs. Our efforts are presented in four chapters: - We propose Relational Graph Convolutional Network (R-GCN) encoders for relational link prediction in knowledge bases. R-GCNs are a novel variant of GNNs suitable for modeling the directed, multirelational data found in KBs. By combining our R-GCN encoder with a factorization decoder from the literature, we achieved state-of-the-art performance on the FB15k-237 dataset at the time of publication. Our model performs especially well for complicated inferences involving high-degree vertices and rare relations. - We introduce two GNN-based models for factoid question answering over KBs, relying either on choosing individual answer vertices or on choosing a best path to the answer. In addition to our R-GCN, we propose a variant that uses gates to limit which edges are used. We encourage sparsity in this choice through an L_1-penalty. The improvement derived from sparsity demonstrates how GNN-based models benefit from filtering out superfluous edges. - We introduce a novel model for fact verification over open collections of tables, combining a RoBERTa-encoder for linearised tables with a cross-attention mechanism for fusing evidence documents. Linearisation represents an important alternative to graphs for modeling structure. When operating in the open domain, our approach achieves performance on par with the current closed-domain state of the art; when operating in the closed domain, our approach sets a new state of the art. We also introduce two novel strategies for exploiting closed-domain datasets to improve performance in the open domain, relying on objectives which jointly model claim truth and evidence reranking. - As our experience shows, interpretability is an important issue for GNNs. We propose GraphMask, a novel post-hoc interpretation technique for GNN-based models. By learning end-to-end differentiable zero-one gates for every message, GraphMask produces faithful, scalable, and easily understood explanations for how GNNs arrive at specific predictions. We test our approach on a synthetic task with a known gold standard for faithfulness, demonstrating that GraphMask compares favourably to current alternatives. We furthermore apply our technique to analyze the predictions of two NLP models from the literature -– a semantic role labeling model, and a question answering model.