Visual search is the common task of looking for something in a cluttered visual environment. The item that the observer is searching for is termed the target, while non-target items are termed distractors.
Many visual scenes contain more information than we can fully process all at once (Tsotsos, 1990). Accordingly, mechanisms like those subserving object recognition process only a restricted part of the visual scene at any one time. Visual attention is used to control the selection of the subset of the scene. This subset may be an array of locations, but more likely it is an object, or a small group of objects (Goldsmith, 1998). Most visual searches consist of a series of attentional deployments, which ends either when the target is found or the search is abandoned (see section 6.3).
Overt search refers to a series of eye movements around the scene made to bring difficult-to-resolve items onto the fovea. If the relevant items in the visual scene are large enough to be identified without fixation, search can be successfully performed while the eyes are focused on a single point. Attentional shifts made during a single fixation are termed covert, because they are inferred rather than directly observed. Under laboratory conditions, many search tasks can be performed entirely with covert attention. Under real world conditions, a new point of fixation is selected 3 or 4 times per second. Overt movements of the eye and covert deployments of attention are closely related (Kowler, Anderson, Dosher, & Blaser, 1995). However, with stimuli that do not require direct foveation, 4-8 objects can be searched during each fixation. This means that either such objects are processed in parallel, or we can make several covert attentional shifts per fixation.
How well can we perform a specific search task? In a standard laboratory search task, observers are asked to search for a target in an image on a computer monitor. Such an artificial scene might subtend a region of the visual field measuring 20 degrees of visual angle (dva) by 20 dva. Observers are asked to perform several hundred trials of the search task. The number of items in the scene (set size), and thus the number of distractors, is varied from trial to trial. Typically, the target is present on half of the trials, and absent on the others. The time to make a response (reaction time, or RT) is measured, as well as the accuracy of the answer. RT increases in a roughly linear manner with set size. The slope of the RT x set size function is a standard measure of search efficiency, since it gives an estimate of search throughput in terms of items per unit time. Theoretical assumptions are needed in order to translate from slope to an actual estimate of the number of items that have been attended and processed. Without committing to a specific theoretical stance, we can say that searches with slopes near zero are efficient. For stimuli that are large enough not to require eye movements, an inefficient search is one with target-present slopes in the 20-40 msec/item range. Target absent trials tend to have slopes that are a bit more than twice as steep as the target present slopes (Wolfe, 1998) (see section 6.1 below). Much steeper slopes can be obtained if each item requires fixation prior to identification, or if each item is intrinsically difficult to classify as target or distractor.
A linear function, fitted to the RT x set size data, will have an intercept as well as a slope. That intercept will be several hundred msec long even for simple search. It reflects the components of the task that do not involve sequential deployments of attention. These components will include visual processes prior to attentional selection, as well as decision and motor components coming after the search, per se, has been completed.
This discussion of RTs assumes a situation in which errors are relatively rare, as in the case for simple searches where stimuli remain visible until observers respond. In other experimental regimes, more information is obtained from the error rates than from the RTs (Palmer et al., 2000). For example, if stimuli are briefly presented, it is error rate that will increase with set size, rather than RT (e.g., Dukewich & Klein, 2005). In general, speed and accuracy will trade off and this must be taken into account in interpreting search results. Sometimes this tradeoff is exploited as a method in its own right (e.g. Carrasco, Giordano, & McElree, 2006)
You can search for anything. However, some searches will be more efficient than others. In this next large section, we will describe a number of the factors that determine search efficiency in laboratory experiments. In section 9, we will consider if these apply to real world search tasks.
For a search to be possible at all, the target must be different from the distractors in some detectable fashion. Finding a needle in a haystack will be a laborious search but it will be possible. Finding the one, specific needle in a needle stack will not be possible. Stimuli can differ from each other in a host of ways. There is a limited set of attributes that will allow a target to be found efficiently among distractors that differ in that attribute. We call these guiding attributes as they can be used to guide attention.
Earlier work would refer to these as preattentive features (Treisman & Gelade, 1980). The term preattentive is used in several ways, some of them problematic. To say that an attribute like color is preattentive seems to imply that all processing of color is done before or without attention. That is unlikely. The original use of preattentive had a spatial/neural aspect to it, implying that some brain loci were preattentive. More modern understandings suggest that an area like primary visual cortex might initially process a visual stimulus without showing an influence of attention. However, activity in the very same piece of cortex might be subsequently modulated by attention in a reentrant manner (Di Lollo, Enns, & Rensink, 2000; Lamme & Roelfsema, 2000; Saalmann, Pigarev, & Vidyasagar, 2007). The most helpful use of the term preattentive is a temporal usage. Prior to deployment of attention to an object, any visual processing of that object is, by definition, preattentive. In any case, we will use the more neutral term guiding attribute to refer to visual properties that can be used to direct deployment of attention. In this jargon, a feature (like red) is a specific instance of an attribute (like color).
Below, we reproduce a list of attributes modified from Wolfe and Horowitz (2004). These are grouped by the likelihood that they will support efficient search. Where references are not listed, they can be found in the original article. A reasonable estimate would be that there are between ten and twenty-five basic attributes that guide the deployment of attention.
Attribute | Description | Examples |
---|---|---|
Undoubted Attributes | Undoubted meaning that there are a large number of studies with converging methods. |
|
Probable Attributes | Less confidence due to limited data, dissenting opinions, or the possibility of alternative explanations |
|
Possible Attributes |
Still less confidence |
|
Doubtful cases |
Unconvincing, but still possible |
|
Second-order Attributes | This is a category of visual properties that seem to support efficient search by creating other attributes. For example, orientation is an uncontroversial feature. Orientation in the third dimension (slant) appears to support efficient search. There are many cues to depth that will produce a target of one orientation and distractors of another in the inferred third dimension. We could declare all of these to be basic attention-guiding attributes but it might be better to consider them to be properties that are analyzed in early visual stages, without the need for attention, but without the ability, in isolation, to guide attention. In some case (e.g. lighting direction), that has been shown experimentally (Ostrovsky, Cavanagh, & Sinha, 2004). |
|
Probably non-attributes | Suggested guiding features where the balance of evidence argues against inclusion on the list |
|
When free-viewing a scene, some items or locations will tend to attract attention because of visual salience. Used in this sense, salience is a bottom-up, stimulus-driven phenomenon. An item that differs dramatically from its neighbors in one or more of the attributes in Table 1 will tend to be salient. Bottom-up salience can be modified by top-down goals of the searcher. Thus, a search for a green vertical item will cause attention to be guided to all green and all vertical items ( Figure 1). Moreover, observers report that this top-down command renders green and, perhaps, vertical items more perceptually salient (Blaser, Sperling, & Lu, 1999). Similar effects can be seen at the single cell level (e.g. Bichot & Schall, 1999).
What happens if an item of considerable bottom-up salience is not the desired target of the current visual search? If bottom-up and top-down factors are in conflict, will top-down goals prevent deployment of attention to an otherwise salient item or will that item capture attention? The voluminous literature on this topic shows that under strong top-down control, quite salient stimuli seem to be ignored (e.g. Williams, 1985) but that some stimuli, notably onsets and/or new objects, are very hard to ignore (e.g. Remington, Johnston, & Yantis, 1992; Theeuwes, 1994). As Sully said in the 19th century:
In general, efficiency of search decreases as the similarity between target and distractors increases, and increases as the similarity among distractors increases (Duncan & Humphreys, 1989). The most efficient searches are searches for a distinctive target amongst homogeneous distractors.
Guiding features are subject to rules that govern their ability to guide. These differ from the rules that govern perception of these features (Wolfe & Horowitz, 2004). For instance, effective guidance requires differences between target and distractor that are much greater than one just noticeable difference (Nagy & Sanchez, 1990). For some attributes, perhaps for most, similarity is defined in categorical terms. That is, a target that is categorically different from distractors will be easier to find that one which is equally distant in feature space but within the same category (Daoutis, Pilling, & Davies, 2006, Wolfe, Friedman-Hill, Stewart, & O'Connell, 1992).
The efficiency of a search will be influenced by the distribution of items across the visual field. We can identify two countervailing tendencies. As density increases, many searches, particularly simple feature searches, tend to get easier (Nothdurft, 2000; Sagi, 1990). Having a target item close to a distractor item makes it easier to notice that they are different. On the other hand, when items are close to each other, they crowd each other, making it hard to identify individual items. For instance, a letter that is just big enough to read at, say, 5 deg eccentricity, may be impossible to read if flanked by other letters (He, Cavanagh, & Intriligator, 1996; Levi, Klein, & Aitsebaomo, 1985). If crowding makes it harder to identify items, it will slow search for those items (Vlaskamp & Hooge, 2006). The specific densities producing crowding or the advantages of proximity will vary with the specific nature of the search stimuli.
Target eccentricity (distance from the fixation point) will also modulate search performance. All else being equal, targets, even large, easily identified targets, will be found more slowly as their eccentricity increases (Carrasco, Evert, Chang, & Katz, 1995).
As noted earlier, eye movements occur at rates far slower than inefficient covert search. As a result, search will become markedly less efficient once items become small enough to require foveation before they can be recognized. If the search display is large enough, it will be necessary to move the eyes or the head, with similar effects on search efficiency.
Your ability to find a target in the current search is affected by what you have been searching for previously. In general, you are faster searching for a given target if you found that same target on a recent trial (Hillstrom, 2000). This memory for the target identity seems to go back about seven trials (Maljkovic & Nakayama, 1994).
This priming effect might occur by facilitating guidance (e.g. If I found a red vertical target, I can more effectively guide attention toward subsequent red and vertical items.) (Kristjansson, 2006). Alternatively, it might be due to memory or response facilitation (Huang, Holcombe, & Pashler, 2004). These need not be mutually exclusive effects.
The layout of previous displays can also modulate RT. In a sequence of otherwise random displays, repeated association of one spatial configuration of search items with one target location will speed search. This is known as contextual cueing (Chun & Jiang, 1998). This is a robust and long-lasting example of perceptual learning. Again, it is not entirely clear if it is due to improved guidance (e.g. If I see this display, the target must be in this location.) or some sort of response facilitation (Kunar, Flusberg, Horowitz, & Wolfe, 2007). This may be a laboratory demonstration of more general contextual guidance effects observed with natural scenes.
Interestingly, repeated search through a small set of unchanging items does not become more efficient with repetition, even over hundreds of trials (Wolfe, Klempen, & Dahlen, 2000). RT is speeded, but the slope of the RT x set size function remains constant. Apparently, using memory for an item’s location is less efficient than repeating the visual search (Kunar, Flusberg, & Wolfe, 2007). This is true if all locations are potential target locations. If targets appear in only a few locations, learning those locations will improve the efficiency of search. Thus, in a real world search for your cat, search efficiency will be improved by learning that the cat sits in only five of the possible locations in this room (Kunar, Flusberg, & Wolfe, 2007).
Slopes of RT x set size functions for target-present trials of simple, inefficient searches (e.g. for a letter among other letters) are on the order of 20-40 msec/item. If items were being processed one after the other in series and if items were never revisited during a search (see next section), this would seem to imply that each item takes 40-80 msec to process; meaning that 12-25 items would be processed each second. (Why? In the serial, self-terminating search described here, observers will need to search through an average of \((N+1)/2 \) items in order to find the target.).
A problem arises here because estimates of the minimum time required to recognize a single object are almost always greater than 100 msec. One solution is to assume that multiple items (perhaps all items) are processed in parallel (Palmer, 1995). A somewhat different approach notes that the slope of the RT x set size function is a measure of throughput but not necessarily a measure of the time required to process each item. A carwash can serve as a metaphor for this sort of pipeline process. It might take three minutes to wash each car, but the next car does not have to wait for the first car to be completely washed before entering. The carwash’s throughput might be one car every 30 seconds. The key insight is that while only one car can enter at a time, multiple cars can be in the carwash simultaneously (Wolfe, 2003).
Many models of search that propose deployment of attention to one item at a time (see section 9) have argued that each item is only attended once during a search, a process known as sampling without replacement. The phenomenon of inhibition of return (IOR) was often invoked as a mechanism for this memory. IOR refers to the finding that it is harder to direct attention to a recently attended location or object (Posner & Cohen, 1984; for a review see Klein, 2000). In principle, IOR could prevent deployment of attention to rejected distractors (Klein, 1988). In practice, however, search slopes are, at most, modestly affected when IOR is disrupted (Horowitz & Wolfe, 1998). If IOR has a role in search, it seems more likely that it is something like a foraging facilitator (Klein & MacInnes, 1999), keeping search away from a few recently visited items but not tagging every rejected distractor in a search (but see Hooge, Over, van Wezel, & Frens, 2005). One implication of the failure to tag each item is that the actual sampling rate may be even faster than the 12-25 items/second rate noted in section 6a.
One simple way to prevent repeated deployments of attention to rejected distractors would be to adopt a scanning strategy (e.g. reading a display from left to right and top to bottom), which requires only a memory for the scanning plan. Volitional strategies of this sort are undoubtedly part of many complex search tasks (e.g. I have looked in the kitchen. Now I will search the bedroom.) However, evidence suggests that volitional deployments of attention are much slower than automatic deployments (Wolfe, Alvarez, & Horowitz, 2000). Volitional deployments appear to occur at a rate similar to saccadic eye movements; this may not be a coincidence. Eye movements in complex searches do appear to be guided by such strategies (Gilchrist & Harvey, 2006). Real world searches may well be combinations of relative slow strategic choices and much faster, but more chaotic search of the local neighborhood.
It is easy enough to decide when to terminate a successful visual search. You can quit when you have found the target. When do you abandon an unsuccessful search? The obvious answer is that you can declare the target to be absent when you have rejected every distractor object. However, as noted in the previous section, we do not have perfect memory for rejected distractors, making it difficult to determine when this point has been reached. Moreover, other properties of the data (e.g. RT distributions) argue against an exhaustive search. Observers appear to set a quitting threshold in an adaptive manner based on whatever information they can glean from preceding trials (e.g. I got the last absent trial correct. Perhaps I could go a little faster on the next trial.). It is difficult to model this behavior for situations in which observers search similar displays for hundreds of trials. It is daunting to contemplate how unsuccessful searches are terminated under real-world conditions.
An interesting special case occurs when targets are rare. Low target prevalence is a feature of important search tasks like airport baggage screening and routine medical screening. In the lab, low prevalence puts strong pressure on observers to make target absent responses. In turn, this shift in criterion (using the term in its signal detection theory sense) will increase miss errors and decrease false alarm errors. This is a potential source of trouble in tasks that are put in place to detect important rare events (Wolfe et al., 2007).
How does the brain perform search tasks? There is a substantial literature addressing this question using electrophysiological and brain imaging methods in humans and non-human primates (reviewed in Reynolds & Chelazzi, 2004). Effects of attention are widespread in visual cortex and extend down to the earliest stages of primary visual cortex and, in some hands, further down to the lateral geniculate nucleus of the thalamus. These effects on early visual processing stage appear to involve feedback from later stages in processing. Initial processing of a stimulus may not be modulated by attention. With a couple of hundred msec of stimulus onset, response to the same stimulus in the same area of the visual cortex can show the effects of attention.
At the single cell level, attentional modulation can take many forms. Response can be modulated based on the features or the locations of stimuli. Responses can become larger or more sharply tuned. Attention can improve signal-to-noise ratios. Receptive fields can shift in space. While this seems complex, there is no reason to imagine that attention should have a single neural signature. Attention is used in multiple ways. It has numerous behavioral consequences. It is reasonable that it should have a variety of physiological effects.
Attentional control signals appear to arise from a fronto-parietal network which directs both covert and overt attentional shifts (Corbetta, 1998). Parietal lesions often produce symptoms of hemineglect, a neurological condition where observers can see but have great difficulty directing attention and/or action toward the visual field opposite to the side of the lesion (right parietal lesions produce neglect of the left visual field) (Mort & Kennard, 2003).
The neural locus of the salience map is a matter of some controversy. Li (2002) has proposed that primary visual cortex (V1) is the locus of a bottom-up salience map, based on modeling and psychophysical results. Kusunoki, Gottlieb, & Goldberg (2000) claim to have identified a bottom-up salience map in lateral intraparietal area (LIP), on the basis of single-unit recordings in monkeys. Brain regions which seem to integrate both top-down and bottom-up salience include frontal eye fields (FEF, Thompson, Bichot, & Sato, 2005) and medial temporal cortex (MT, Treue & Martinez-Trujillo, 2006). It seems likely that there is no single salience map in the brain, but rather a network of maps for different tasks, which may compete or cooperate with one another depending on task demands.
The fronto-parietal network is particularly associated with top-down control (Corbetta & Shulman, 2002). Attentional capture, for example, appears to be the consequence of frontal deactivation (Lavie & de Fockert, 2006). Corbetta and Shulman argued that capture represented a “circuit-breaker” on the fronto-parietal network, enabling attention to be directed towards important events that are not part of the organisms current goals. They located this circuit breaker in a more ventral network (Corbetta & Shulman, 2002).
Treisman and Gelade (1980) established the binding problem as one of the fundamental issues in attention and search. While the visual system analyzes stimuli into their component features, we experience holistic objects. Where and how does this synthesis occur? Neuropsychological evidence suggests that the parietal lobe is important for feature binding. Bilaterial parietal lesions can produce Bálint's syndrome. Bálint's patients exhibit "simultagnosia", an inability to perceive more than one object at a time. As with neglect, this is not a visual sensory problem. The patients can see objects at locations throughout the visual field. However, in the presence of multiple objects, attention is fixed on one to the apparent exclusion of awareness of any others (Driver, 1998). Converging evidence from other neuroscientific techniques supports this conclusion (Humphreys, Hodsoll, Olivers, & Yoon, 2006). How binding is achieved remains controversial. The leading contenders are temporal binding, via synchronous oscillations, or place coding (for a review, see Treisman, 1999)
Much of the work on basic search processes has either ignored eye movements, or controlled them. This does not necessarily undermine the validity of these studies. Measures of eye movements and RTs in search are highly correlated, and enforcing fixation does not change the pattern of results (Klein & Farrell, 1989; Zelinsky & Sheinberg, 1997). However, eye movements play an important role in search of complex scenes, where many important details cannot be resolved in the periphery. Furthermore, since eye movements can be observed directly, unlike shifts of covert attention, they provide a rich dataset to improve our understanding of search.
Note that there are two basic categories of eye movements: saccades and smooth pursuit. Saccades are rapid, ballistic movements that shift gaze from one point to another. Smooth pursuit movements follow the motion of an object. With a few exceptions (e.g. Khurana & Kowler, 1987; Morvan & Wexler, 2005), the literature on eye movements in visual search is concerned with saccades. Analysis of the distribution of saccades and saccadic latencies has contributed a great deal to our understanding of search. Saccades show evidence of both top-down (Chen & Zelinsky, 2006; Pomplun, 2006) and bottom-up (Sobel & Cave, 2002) guidance. Eye movement studies have also been used to demonstrate new forms of search guidance, such as guidance by scene context (Neider & Zelinsky, 2006a; Torralba, Oliva, Castelhano, & Henderson, 2006). Space limitations preclude a detailed review of this literature. Excellent reviews of the role of eye movements in search are available elsewhere (Findlay & Gilchrist, 2005; Henderson & Ferreira, 2004).
Studies of eye movements have also been used to shed light on the question of memory, or sampling strategy in visual search (see section 5.2). When objects are very small and sparse, requiring foveation, perfect memory can be demonstrated (Peterson, Kramer, Wang, Irwin, & McCarley, 2001). Under other circumstances, IOR may serve to discourage fixations on recently fixated items (Boot, McCarley, Kramer, & Peterson, 2004). The eyes do revisit examined locations (Gilchrist & Harvey, 2000; Gilchrist, North, & Hood, 2001; Hooge, Over, van Wezel, & Frens, 2005) suggesting a small but potentially useful memory for eye movements in search (McCarley, Wang, Kramer, Irwin, & Peterson, 2003). That memory supplemented by by deliberate scanning strategies (Gilchrist & Harvey, 2006).
The great bulk of work on visual search discussed has used simple stimuli presented on computer monitors. The hope and assumption is that the rules that apply in the lab will also apply in the world.
One might ask why researchers have resorted to such artificial stimuli when our interest is in how observers find real objects in real scenes. It is worth noting just a few of the daunting methodological issues. If one wants to ask about searches for red vertical lines amongst green vertical and red horizontal distractors, it is straight-forward to present hundreds of trials with targets and countable numbers of distractors in random locations. If one wants to ask about searches for coffee makers in kitchens, none of this is simple. Coffee makers cannot be placed randomly in real kitchens. If we ask repeatedly about this one kitchen, we have changed the question. We cannot easily generate arbitrary numbers of real kitchens (though this is easier if we opt for realistic kitchens drawn with architectural software). We do not know how to count the number of objects. Is the stove an object? Is every knob on the stove an object? If not, why not? Of course, much interesting and important work has been done with scene stimuli (Brockmole & Henderson, 2006; Eckstein, Drescher, & Shimozaki, 2006; Henderson & Hollingworth, 1999; Hidalgo-Sotelo, Oliva , & Torralba, 2005; Neider & Zelinsky, 2006b) much of it in the eye movement literature, referenced above (Henderson & Ferreira, 2004). The spatial layout of scenes undoubtedly guides the deployment of attention. We look for coffee makers on surfaces that are likely to hold such objects (Torralba, Oliva, Castelhano, & Henderson, 2006). However, guidance by scene context may be qualitatively different from guidance by attributes like color.
Modern civilization has created many specialized search tasks: Examination of bridges for metal fatigue, airport security, air traffic control, and so on. Each has its own specific challenges; for instance, a different balance of the relative costs of miss vs. false alarm errors.
Analysis of medical images (notably x-rays) has been the subject of one of the more extensive literatures in applied visual search. (e.g. Berbaum et al., 1998; Eckstein, Pham, Abbey, & Zhang, 2006; Judy, Swensson, & Szulc, 1981; Krupinski, 2005; Kundel, 1991). Space does not permit an extensive review of this topic. One of the challenges in medical image search is that the number of targets is often unknown and it can be important to find every target (e.g. every tumor). Thus, in this literature, there is considerable interest in the phenomenon of satisfaction of search, the situation where otherwise detectable targets are not found because other targets were found first (Berbaum et al., 1990). Observers are satisfied and terminate search. This is a version of the search termination problem described earlier; a version with important consequences.
Logan (2004) has provided a recent review of some modeling efforts in visual search and attention, more generally. Here, we briefly mention a few of the leading efforts.
Much of the work in this field is built on the foundation of Treisman’s seminal Feature Integration Theory (Treisman & Gelade, 1980). In its original form, FIT held that a set of basic features could be processed in parallel, across the visual field in a preattentive stage. Other visual stimuli including conjunctions of basic features could not be identified unless selected by attention in a serial manner. In particular, FIT held that attention was required if two or more features were to be bound into a coherent percept.
Guided Search is an intellectual heir of FIT. It holds that basic features, derived from the early, parallel stages of processing, can be used to guide the subsequent deployment of attention. In this manner, a conjunction of two features can be found quite efficiently by guiding attention to the intersection of the sets of items possessing each feature (Wolfe, 1994, 2007). The present article describes many search phenomena in terms influenced by GS.
There are a variety of recent computational models that can be seen as broadly in this FIT/GS theoretical tradition, assuming that early visual process control subsequent attentional selection. A non-exhaustive list would include the work of Itti and colleagues (Itti & Koch, 2001), Hamker (2004), and Tsotsos (J K Tsotsos et al., 1995).
Many models (including many of the aforementioned) are grounded in neurophysiological as well as psychophysical work. Early models had a feed-forward structure where early visual processes were not influenced by attention. Explicitly neuronal models once tended to describe attention as a filter or gate on the path from input to perception. More recent efforts tend to model attentional effects as feedback from front-parietal loci on to the earlier stages of visual processing (Reynolds & Chelazzi, 2004). Desimone and Duncan's Biased Competition model describes the effects of attention at the neuronal level. When multiple stimuli have the potential to influence the response of a given neuron, they compete for control of the output of that neuron. As the theory’s name suggests, attention acts to bias that competition in favor of some stimuli over others (Desimone & Duncan, 1995).
Signal detection models have been able to provide quite precise accounts of the rules governing relatively simple searches (e.g. search for a line of one orientation among distractors of another with all lines embedded in visual noise). These models are characteristically parallel in nature, assuming that all items are processed at once. Adding distractors adds noisy signals that might be mistaken for a target, thus degrading performance (Verghese, 2001).
The plethora of models are not as inconsistent with each other as they might appear. When a parallel signal detection model allows cues to modulate how much attention is directed to an item, it has moved closer to a FIT or GS-style account of selection. When an FIT or GS-style model allows for multiple items to be selected at the same time, it has blurred the distinction between serial and parallel stages. None of these models is inconsistent with the biased competition notion that stimuli compete for access to cells that could process any of them, but not all at the same time. Creation of the true model of search does not require commitment to the correct school of modeling. It requires getting the details right.
Internal references
Binding Problem, Eye Movements, Inhibition of Return, Vision, Visual Attention, Visual Salience, Gestalt principles