Here is a short abstract of my thesis. A somewhat longer synopsis is also available.

The Minimum Description Length Principle and Reasoning under Uncertainty

Peter Grünwald

ILLC-Dissertation Series nr. DS-1998-03

Most research reported in the thesis concerns the so-called Minimum Description Length (MDL) Principle. The MDL Principle is a general method for inductive inference. The fundamental idea behind the MDL Principle is that any regularity in a given set of data can be used to compress the data, i.e. to describe it using fewer symbols than needed to describe the data literally. The more regularities there are in the data, the more we can compress it. This leads to the view (which is just a version of Occam's famous razor) that the more we can compress a given set of data, the more we can say we have learned about the data.

The thesis starts with an introduction to MDL intended for a general audience. It continues with some new theoretical research concerning MDL and related statistical methods, like Bayesian Statistics and the Maximum Entropy Principle. This is followed by a practical part, in which results of applying MDL on real-world data sets are reported. It ends with a part on non-monotonic reasoning and Artificial Intelligence's frame problem, not directly related to MDL.

Question: Can we use simplistic models?

The main question investigated in the thesis is the following:

Under what circumstances is it safe or even advisable to use overly simplistic models for the data at hand?

The result of statistical analysis of a given set of data is nearly always a model for this data that is really a gross simplification of the process that actually underlies these data. Nevertheless, such overly simple models are often succesfully applied in practice. How is this possible? And can we identify situations in which this is possible and situations in which it is not?

These are the main questions we investigate. Though our research is in the framework of the MDL Principle, the results are relevant for other statistical methods also.