Content area
Full Text
Understanding the brain computations leading to object recognition requires quantitative characterization of the information represented in inferior temporal (IT) cortex. We used a biologically plausible, classifier-based readout technique to investigate the neural coding of selectivity and invariance at the IT population level. The activity of small neuronal populations (~100 randomly selected cells) over very short time intervals (as small as 12.5 milliseconds) contained unexpectedly accurate and robust information about both object "identity" and "category." This information generalized over a range of object positions and scales, even for novel objects. Coarse information about position and scale could also be read out from the same population.
Primates can recognize and categorize objects as quickly as 200 ms after stimulus onset (1). This remarkable ability underscores the high speed and efficiency of the object recognition computations by the ventral visual pathway (2-5). Because the feed-forward part of this circuitry requires at least eight or more synapses from the retina to anterior IT cortex, it has been proposed that the computations at each stage are based on just one or very few spikes per neuron (6, 7). At the end of the ventral stream, single cells in IT cortex show selectivity for complex objects with some tolerance to changes in object scale and position (2-4, 6, 8-16). Small groups of neurons in IT cortex tuned to different objects and object parts might thus provide sufficient information for several visual recognition tasks, including identification, categorization, etc. This information could then be "read out" by circuits receiving input from IT neurons (17-19).
Although physiological and functional imaging data suggest that visual object identity and category are coded in the activity of IT neurons (2-6, 8-16, 20), fundamental aspects of this code remain under debate, including the discriminative power in relation to population size, temporal resolution, and time course. These questions must be understood at the population level to provide quantitative constraints for models of visual object recognition. We examined these issues by obtaining independent recordings from a large unbiased sample of IT neuronal sites and using a population readout technique based on classifiers. The readout approach consists of training a regularization classifier (21) to learn the map from neuronal responses to each object label (Supporting Online Material), as in recent...