Prof. Dr. Ingo Rentschler

Institute of Medical Psychology, University of Munich, Germa

Cognitive psychophysics: A machine vision approach to human image understanding

Psychophysics is concerned with the existence of formal relationships between physical stimulus descriptions and sensation. This concept, advanced by the German physicist G.T. Fechner in 1860, underlies the measurement of thresholds in sensory physiology but has little to say about perception that implies the understanding of sensation.

The shortcomings of Fechnerian psychophysics can be overcome by noting that categorisation is the main way that organisms make sense of experience. This suggests that image understanding can be studied by drawing from the sources of technical pattern recognition, were input signals are assigned to one of a prescribed number of classes. For complex objects or objects embedded in complex scenes, however, the approach of traditional pattern recognition is not viable. That is, pattern classes cannot be conceived of simply as bounded regions in some vector space. This problem, which is related to Wittgenstein's notion of Familienähnlichkeit (family resemblance), can be overcome by using strategies of recognition-by-parts as have been developed for machine vision. Here patterns or scenes are decomposed into constituent parts and described in terms of attributes of parts and part relations, which can be linked together into relational structures. In our view, it is the need of constructing such structural descriptions that distinguishes image understanding from mere pattern labelling.

Human image understanding can therefore be analysed as follows: The patterns of a learning set are each to be assigned to one of a prescribed number of classes, and categorisation performance is induced by way of supervised learning. The learning process is thus characterised as a time series of classification matrices. In the spirit of adaptive filter theory, mental representations and decision making for categorisation can be recovered formally from such (observed) data structures by fitting them with the (predicted) performance of a machine vision systems.

This new approach is illustrated by two examples concerning the roles of context and haptic prior knowledge in image understanding. As context is concerned, we show that the generalisation of visual recognition to the inversion of contrast polarity is determined by the degree to which contrast-dependent image attributes are used for learning mental object representations. As to the influence of touch on vision, we find strong effects of haptic prior knowledge on category-learning for 2-D views of unfamiliar 3-D objects. The data analysis revealed that they reflect a marked increase of attribute resolution and relational depth of visual object representations.

Taken together, these preliminary results suggest the existence of relationships between human image understanding and language that are closer than thought of before.

I. Rentschler, T. Caelli, W. Bischof & M. Jüttner (eds.), Special Issue on: Object recognition and image understanding by brain and machines. Spatial Vision 13, No. 2–3, 2000, VSP Publishers, Utrecht, The Netherlands (ISSN 0169-1015)