Content area
Full Text
The two essential properties of language are that it refers to things in the world and that its grammatical structure can be characterized independently of meaning or reference (1). The autonomy of grammatical structure has led to a long tradition in psycholinguistics according to which it is assumed that the brain mechanisms responsible for the rapid syntactic structuring of continuous linguistic input are "encapsulated" from other cognitive and perceptual systems (2), much as early visual processing often is claimed to be structured by autonomous processing modules (3). This contrasts with a second tradition by which language processing is inextricably tied to reference and relevant behavioral context (4). The primary empirical evidence that syntactic processing is modular is that brief syntactic ambiguities, which arise because language unfolds over time, appear to be initially resolved independently of prior context. Unfortunately, it has been impossible to perform the crucial test to determine whether strongly constraining nonlinguistic information can influence the earliest moments of syntactic processing, because experimental techniques that provide fine-grained temporal information about spoken language comprehension could not be used in natural contexts. However, by recording eye movements (5) as participants followed instructions to move objects (for example, "Put the apple that's on the towel in the box"), we were able to monitor the ongoing comprehension process on a millisecond time scale. This enabled us to observe the rapid mental processes that accompany spoken language comprehension in natural behavioral contexts in which the language had clear real-world referents.
Our initial experiments demonstrated that individuals processed the instructions incrementally, making saccadic eye movements to objects immediately after hearing relevant words in the instruction. Thus the eye movements provided insight into the mental processes that accompany language comprehension. For example, when asked to touch one of four blocks that differed in marking, color, or shape, with instructions such as "Touch the starred yellow square," a person made an eye movement to the target block an average of 250 ms after the end of the word that uniquely specified the target with respect to the visual alternatives (for example, after "starred" if only one of the blocks was starred, and after "square" if there were two starred yellow blocks). With more complex instructions, individuals made informative sequences of eye movements...