Learning with Imperfect Supervision for Language Understanding Mostafa Dehghani Abstract: Learning with Imperfect Supervision for Language Understanding Mostafa Dehghani Humans learn to solve complex problems and uncover underlying concepts and relations given limited, noisy or inconsistent observations and draw successful generalizations based on them. This rests largely on the poverty of the stimulus argument, or what is sometimes called Plato’s problem: “How do we know so much when the evidence available to us is so meagre?” In contrast, the success of today’s data-driven machine learning models is often strongly correlated with the amount of available high quality labeled data and teaching machines using imperfect supervision remains a key challenge. In practice, however, for many applications, large-scaled high-quality training data is not available, which highlights the increasing need for building models with the ability to learn complex tasks with imperfect supervision, i.e., where the learning process is based on imperfect training samples. When designing learning algorithms, pure data-driven learning, which relies only on previous experience, does not seem to be able to learn generalizable solutions. Similar to human’s innately primed learning, having part of the knowledge encoded in the learning algorithms in the form of strong or weak biases, can help learning solutions that better generalize to unseen samples. In this thesis, we focus on the problem of the poverty of stimulus for learning algorithms. We argue that even noisy and limited signals can contain a great deal of valid information that can be incorporated along with prior knowledge and biases that are encoded into learning algorithms in order to solve complex problems. We improve the process of learning with imperfect supervision by (i) employing prior knowledge in learning algorithms, (ii) augmenting data and learning to learn how to better use the data, and (iii) introducing inductive biases to learning algorithms . These general ideas are, in fact, the key ingredients for building any learning algorithms that can generalize beyond (imperfections in) the observed data. We concentrate on language understanding and reasoning, as one of the extraordinary cognitive abilities of humans, as well as a pivotal problem in artificial intelligence. We try to improve the learning process, in more principled ways than ad-hoc and domain or task-specific tricks to improve the output. We investigate our ideas on a wide range of sequence modeling and language understanding tasks.