The Minimum Description Length Principle and Reasoning under Uncertainty
Peter Grunwald
Abstract:
%Nr: DS-1998-03
%Author: Peter Grunwald
%Title: The Minimum Description Length Principle and Reasoning under Uncertainty
Most research reported in the thesis concerns the so-called Minimum
Description Length (MDL) Principle. The MDL Principle is a general
method for inductive inference. The fundamental idea behind the MDL
Principle is that any regularity in a given set of data can be used to
compress the data, i.e. to describe it using fewer symbols than needed
to describe the data literally. The more regularities there are in the
data, the more we can compress it. This leads to the view (which is
just a version of Occam's famous razor) that the more we can compress
a given set of data, the more we can say we have learned about the
data.
The thesis starts with an introduction to MDL intended for a general
audience. It continues with some new theoretical research concerning
MDL and related statistical methods, like Bayesian Statistics and the
Maximum Entropy Principle. This is followed by a practical part, in
which results of applying MDL on real-world data sets are reported. It
ends with a part on non-monotonic reasoning and Artificial
Intelligence's frame problem, not directly related to MDL.
Question: Can we use simplistic models?
The main question investigated in the thesis is the following:
Under what circumstances is it safe or even advisable to use overly
simplistic models for the data at hand?
The result of statistical analysis of a given set of data is nearly
always a model for this data that is really a gross simplification of
the process that actually underlies these data. Nevertheless, such
overly simple models are often succesfully applied in practice. How is
this possible? And can we identify situations in which this is
possible and situations in which it is not?
These are the main questions we investigate. Though our research is in
the framework of the MDL Principle, the results are relevant for other
statistical methods also.
Answer: 'Safe' and 'Risky' Statistics
Briefly, we reach the following conclusion: overly simple models can
be applied to make predictions and decisions in two different ways: a
`safe' one and a `risky' one. If a model is used in the `safe' way,
then it will be `reliable' in the following sense: the model gives a
correct impression of the prediction error that will be made if the
model is used to predict future data, even in the case where the model
is a gross simplification of the process that truely underlies the
given data. If the model is used in the `risky' way, there is no such
guarantee (nevertheless, such usage of a model often makes sense). We
state and prove several theorems which show that incorrect models can
be `reliable' in the sense indicated above under many circumstances.
The concept of 'reliability' has several applications. We mention
three:
* It can be used to extend the scope of the MDL Principle to
arbitrary model classes and error measures.
* It allows to distinguish between 'safe' and 'risky' versions of
the method of Maximum Entropy (which is closely related to MDL),
thereby explaining both successes and failures of this
controversial principle.
* It provides new insights in the ongoing debate between
'frequentist' and 'subjectivist' (Bayesian) statisticians.
References
1. http://robotics.stanford.edu/~grunwald/thesis/longabstract.html
2. http://robotics.stanford.edu/~grunwald/thesispage.html
3. http://robotics.stanford.edu/~grunwald/index.html