Languages of Perception Mehdi Dastani Abstract: %Nr: DS-1998-05 %Author: Mehdi Dastani %Title: Languages of Perception In everyday life, we are confronted with visual information provided by the environment. This visual information may originate from the scenes of cities or forests, but also from television images, computer interfaces, and many other natural or artificial sources. In general, we have no difficulties in recognizing meaningful entities in the visual information we receive, and in organizing it coherently. For example, when we look at an urban neighborhood which we have never seen before, we perceive individual buildings and separate them from each other even when they are continuously bounded to each other. Also, in a natural environment, we easily perceive individual flowers, plants, or trees and discriminate them from each other even when one is partially hidden by the other. Although this ability seems to be effortless and direct, it is far from trivial to understand, describe, and model it. In order to understand and model the human visual system, one should analyze visual information as it is presented to human visual sensors (human eyes) and describe how this information can be mapped into meaningful entities for which we have names and which we can place in a conceptual framework. We assume two steps in mapping visual information into meaningful entities. The first step concerns the low-level structuring of visual information. This step provides the constituent structure of visual information, i.e. it determines: A) constituents of visual information and B) how they are composed to build up larger wholes. In the second step, the visual constituents resulting from the first step should then be interpreted in some conceptual framework. The interpretation of visual constituents is based on many factors such as reasoning and past experiences. It should be noted that these two steps interact with each other: when the structured visual information from the first step cannot be placed into a conceptual framework coherently, the low-level structuring step should provide an alternative constituent structure for the visual input. In this thesis, we will concentrate on the first step of structuring visual information and investigate the principles on the basis of which visual constituents are composed to form larger wholes. However, we do not discuss how primitive visual constituents are determined. In an ultimate theory, these should probably be pixels, line-segments, and/or edges between contrastive areas. But for the moment we will avoid commitments about this issue, by focusing on classes of pictures for which a particular type of higher-level units may be assumed as primitives. What we focus on in this thesis is the problem of gestalt perception: assuming a set of primitive elements, we try to account for the phenomena that pictures built up of these elements are perceived by humans as having a particular hierarchical constituent structure. In the study of gestalt perception primitive elements are assumed to be composed and structured unconsciously and directly according to some innate principles that are believed to underlie the human visual system. During the last century, there have been several formulations for these suggested innate principles. We start with a recent formulation of the innate principles of the human visual system and develop a mathematical model for gestalt perception. We discuss various aspects of gestalt perception and work out an application in which a model of human visual system is indispensable.