Eye movements are a behavior that can be measured and their measurement provides a sensitive means of learning about cognitive and visual processing. Although eye movements have been examined for some time, it has only been in the last few decades that their measurement has led to important discoveries about psychological processes that occur during such tasks as reading, visual search, and scene perception.
Contents |
Although we have the impression that we can process the entire visual field in a single fixation, in reality we would be unable to fully process the information outside of foveal vision if we were unable to move our eyes (Rayner, 1978, 1998).
Because of acuity limitations in the retina, eye movements are necessary for processing the details of the array. Our ability to discriminate fine detail drops off markedly outside of the fovea in the parafovea (extending out to about 5 degrees on either side of fixation) and in the periphery (everything beyond the parafovea). (See Figure 1).
While we are reading or searching a visual array for a target or simply looking at a new scene, our eyes move every 200-350 ms. These eye movements serve to move the fovea (the high resolution part of the retina encompassing 2 degrees at the center of the visual field) to an area of interest in order to process it in greater detail.
During the actual eye movement (or saccade), vision is suppressed and new information is acquired only during the fixation (the period of time when the eyes remain relatively still).
While it is true that we can move our attention independently of where the eyes are fixated, it does not seem to be the case in everyday viewing. The separation between attention and fixation is often attained in very simple tasks (Posner, 1980); however, in tasks like reading, visual search, and scene perception, covert attention and overt attention (the exact eye location) are tightly linked.
Because eye movements are essentially motor movements, it takes time to plan and execute a saccade. In addition, the end-point is pre-selected before the beginning of the movement.
While it has generally been assumed that the two eyes move in synchrony and that they fixate the same point in space, recent research clearly demonstrates that this is not the case and the two eyes are frequently deviated from each other (Liversedge, Rayner, White, Findlay, & McSorley, 2006; Liversedge, White, Findlay, & Rayner, 2006).
There is considerable evidence that the nature of the task influences eye movements. A summary of the average amount of time spent on each fixation and the average distance the eyes move in reading, visual search, and scene perception are shown in Table 1.
Task | Typical mean fixation duration (ms) | Mean Saccade Size (degrees) |
---|---|---|
Silent Reading | 225-250 | 2 (8-9 letter spaces) |
Oral Reading | 275-325 | 1.5 (6-7 letter spaces) |
Scene Perception | 260-330 | 4 |
Visual Search | 180-275 | 3 |
From this table, it is immediately apparent that while the values presented in the table are quite representative of the different tasks, they show a range of average fixation durations and for each of the tasks there is considerable variability both in terms of fixation durations and saccade lengths.
At one time, researchers believed that the eyes and the mind were not tightly linked during information processing tasks like reading, visual search, and scene perception. This conclusion was based on the relatively long latencies of eye movements (or reaction time of the eyes) and the large variability in the fixation time measures.
They questioned the influence of cognitive factors on fixations given that eye movement latency was so long and the fixation times were so variable. It seemed unlikely that cognitive factors could influence fixation times from fixation to fixation.
Actually, an underlying assumption was that everything proceeded in a serial fashion and that cognitive processes could not influence anything except very late in a fixation, if at all. However, a great deal of research using new eye trackers that enable better eye tracking has since established a tight link between the eye and the mind, and it is now clear that saccades can be programmed in parallel (Becker & Jürgens, 1979) and, furthermore, that information processing continues in parallel with saccade programming.
During reading, the average fixation duration is about 225-250 ms and the average saccade size is 8-9 character spaces. Figure 2 shows an example of eye movements during reading (also see one here).
In reading, unlike other tasks, character spaces are used rather than visual angle. This is because it has been demonstrated that character spaces are the more appropriate unit than visual angle. So, if the size of the print is held constant and the viewing distance varied (so that there are either more or fewer characters per degree of visual angle), how far the eyes travel is determined by character spaces, not visual angle (Morrison & Rayner, 1981).
Another important characteristic of eye movements while reading is that about 10-15% of the time readers move their eyes (regress) back to previously read material in the text. These regressions, as they are called, tend to depend on the difficulty of the text.
As would be expected, saccade size and fixation duration are also both modulated by text difficulty: as the text becomes more difficult, saccade size decreases, fixation durations increase, and regressions increase.
From these measures alone, it is very clear that global properties of the text influence eye movements greatly. In addition, these three main global measures (saccade size, fixation duration and number of regressions) are also influenced by the type of material being read and the reader’s goals in reading (Rayner & Pollatsek, 1989). For instance, reading a text for understanding produces a very different pattern of eye movement measures when compared to skimming a text while proofreading ( Figure 3).
In addition to global effects, studies have shown clear local effects on words. Measures in these studies focus on the processing of a target word (versus looking at an average measure that is pooled from all words in a sentence, such as the average fixation duration). Local measures include: first fixation duration (the duration of the first fixation on a word), single fixation duration (those cases where only a single fixation is made on a word), and gaze duration (the sum of all fixations on a word prior to moving to another word).
A very important issue in reading is how much information is the reader able to process and use during a single fixation, which, as we’ve noted above, typically lasts for 200-250 ms. This measure is referred to as the perceptual span (also called the functional field of view or, to a lesser degree, the region of effective vision). Although we have the impression that we can see an entire line of text or even an entire page of text, this is an illusion. This fact has been clearly demonstrated in a number of studies over the years that use a “gaze-contingent moving window paradigm”, introduced by McConkie and Rayner (1975; Rayner & Bertera, 1979). For more information on the “gaze-contingent moving window paradigm” and other gaze contingent paradigms, see Eye-Contingent Experimental Paradigms.
Studies have demonstrated that English readers acquire useful information from an asymmetrical region around the fixation point (extending 3-4 character spaces to the left of fixation and about 14-15 character spaces to the right). Research has also found that readers do not utilize information from the words on the line below the currently fixated line (Pollatsek, Raney, LaGasse, & Rayner, 1993).
As briefly mentioned above, the difficulty of the text being read has an impact on eye movement patterns (fixation duration, saccade length, and frequency of regressing to previously read text). Over the past few years, it has become very clear that how long the eyes remain in place is influenced by a host of linguistic factors.
These factors include:
Recently, a number of sophisticated computational models (see Reichle, Pollatsek, Fisher, & Rayner, 1998; Reichle, Rayner, & Pollatsek, 2003; Engbert, Nuthmann, Richter, & Kliegl, 2005) have been presented which do a good job of accounting for the characteristics of eye movements during reading.
Fixation durations in search tend to be highly variable. Some studies report average fixation times as short as 180 ms while others report averages on the order of 275 ms. This is undoubtedly due to the fact that the difficulty level of a search (i.e., how dense or cluttered the array is) and the exact nature of the search task will strongly influence how long viewers pause on each item (see Figure 4 and Figure 5).
Typically, saccade size is a bit larger than in reading (though saccades can be quite short with very dense arrays).
When an array is very cluttered (with many objects and distractors), the search becomes more demanding than when the array is simple. The eye movements on each of these types of arrays typically reflect this property of an array (Bertera & Rayner, 2000; Greene & Rayner, 2001a. 2001b). As the array becomes more complicated, we see an increase in the fixation duration and the number of fixations, as well as a decrease in the average saccade length (Vlaskamp & Hooge, 2006).
Research has shown that in visual search, the perceptual span varies as a function of the difficulty of the distractor letter. That is, the perceptual span is smaller when distractor items were visually similar to the target than when the distractor items are distinctly different.
This suggested that there were two qualitatively different regions within the span: a decision region (where information about the presence or absence of a target is available), and a preview region where some item information is available, but where information on the absence of a target is not yet available.
Describing where a person will most likely fixate while performing a visual task is often described using a saliency map (e.g., Guided Search Model, Cave & Wolfe, 1990; Wolfe, 1994, 2001). The Guided Search model is perhaps the most well known and has sparked a great deal of interest in the guiding mechanisms of visual search. According to this model, guidance within a search array arises from two sources: bottom-up (prioritizes items according to how much they differ from their neighbors) and top-down (prioritizes items according to how much they share target features).
When people look at scenes not every part of the scene is fixated. This is largely because in scene perception information can be obtained over a wider region than is found in reading and possibly, visual search arrays ( Figure 6).
However, it is also clear that the important aspects of the scene are typically fixated (and generally looked at for longer periods than less important parts of the scene).
The average fixation duration in scene perception tends to be longer than that in reading, and likewise the average saccade size tends to be longer ( Figure 6).
The gist of a scene has been defined as the general scene concept (Potter, 1999), and is most often referring to a scene’s basic-level category when investigated in the literature (Oliva, 2005). Researchers have also found that in addition to a scene’s concept, the general scene layout is quickly extracted (Sanocki & Epstein, 1997; Sanocki, 2003).
One very important general finding is that viewers are able to acquire scene gist in a single glance. That is, the gist of the scene is understood so quickly, it is processed even before the eyes begin to move (De Graef, 2005). Indeed, recent research has shown that with only 40 ms of exposure, the visual system can extract enough information to processes the scene’s gist (Castelhano & Henderson, 2007b).
It has become clear that the eyes can quickly go to parts of a scene that are relevant and important. Pioneering works of Buswell (1938) and Yarbus (1967) first documented how a viewer’s gaze is drawn to important aspects of a visual scene and that task goal very much influences eye movements. Much of the research that followed illustrated that the eyes are drawn to informative areas in a scene quickly (Antes, 1974; Mackworth & Morandi, 1967).
Other studies have also made it clear that saliency of different parts of the scene greatly influences where viewers tend to fixate (Parkhurst & Niebur, 2003; Mannan, Ruddock, & Wooding, 1995, 1996). Saliency is typically defined in terms of low-level components of the scene (such as contrast, color, intensity, brightness, spatial frequency, etc.).
There are a number of computational models (Baddeley & Tatler, 2006; Findlay and Walker, 1999;Itti & Koch, 2000, 2001; Parkhurst, Law, & Niebur, 2002) that use the concept of a saliency map to model eye fixation locations in scenes. With this approach, the bottom-up properties of a scene (i.e., saliency map) make explicit predictions about the most visually salient regions of the scene. The models are basically used to derive predictions about the distribution of fixations on a given scene based on these prominent regions.
Research has shown that higher-level factors also have a strong influence on where viewers direct their gaze in a scene (Castelhano & Henderson, 2007a; Henderson & Castelhano, 2005; Henderson & Ferreira, 2004). For instance, see Figure 7.
Recently, Torralba, Oliva, Castelhano, and Henderson (2006) presented a computational model that incorporates the influence of top-down and cognitive strategies.
How much information is extracted from a single fixation on a scene? As noted at the beginning of this section, it is known the extent of the visual field used to extract useful information is much larger in scene viewing than it is in reading. In an early study, Nelson and Loftus (1980) examined object recognition as a function of the closest fixation on that object. Results showed that objects located within about 2.6 degrees from fixation were generally recognized. The results also suggested that information acquired from the region 2-3 degrees around fixation is qualitatively different from information acquired from regions further away (see Henderson & Hollingworth, 1999; Henderson, William, Castelhano & Falk, 2003).
The question of how large the perceptual span is during scene viewing hasn’t been answered as conclusively as it has in reading or visual search. It seems that objects can be located up to 4 degrees from the point of fixation and tagged for a saccade target, but it is not clear what the perceptual span is for other types of information (see Henderson & Ferreira, 2004, for a review). And yet, it does appear that viewers typically gain useful information from a fairly wide region of the scene and that it probably varies as a function of the scene and the task of the viewer.
Interestingly, viewers are rather insensitive to large changes in a scene (McConkie, 1991; Grimes & McConkie, 1995; Grimes, 1996). This phenomenon is referred to as Change Blindness. For instance, see Figure 8.
Research on change blindness has found that the placement of eye movements plays a significant role in the phenomenon of change blindness. Hollingworth and Henderson (2002) found that when the fixation location was accounted for, the ability of people to detect change was significantly higher when the pre-change and post-change region of a scene were both fixated. This study highlighted the importance of encoding and retrieving specific details of the scene in order to be able to detect changes in these images.
When do viewers move their eyes when looking at scenes? Past studies have shown that attention precedes an eye movement to a new location within a scene (Henderson, 1992; van Diepen & D’Ydewalle, 2003). So, it would follow that the eyes will move once the visual information at the center of vision has been processed and a new fixation location has been selected and programmed (for review, see Henderson, 2007).
Research also suggests that at the fovea information is extracted very rapidly, and attention is directed to the periphery almost immediately following the extraction of information (70-120 ms) to choose a viable saccade target. The general timing of the switch between central and peripheral information processing is currently being investigated; however, the inherent variability across scenes makes it difficult to find as specific a time frame as in reading.
Although there are obviously many differences between these tasks, there are some general principles that are likely to hold across them (see also Rayner, 1995, 1998).
For further information on eye movements in visual cognition, we suggest:
Internal references