Multiple object tracking

Multiple Object Tracking, or MOT, is an experimental technique used to study how our visual system tracks multiple moving objects. It was developed in 1988 [1] in order to test (and illustrate) a theoretical proposed mechanism called a Visual Index or FINST (for FINgers of INSTantiation). The theory postulates a small number (around 4) indexes or pointers that pick out and stay attached to individual objects in the visual field independent of their changing properties, and thus allows them to be tracked. The theory addresses a problem of how conceptual descriptions can pick out individual visual objects despite the fact that descriptions themselves are insufficient in general to pick out tokens, as opposed to types [described in 2, 3-5]. But the paradigm has been adopted by many laboratories (Brian Scholl maintains a list of papers on MOT at: MOT_Publications) and has been used in various ways – from testing the FINST theory to providing a continuous attention-demanding task [the FINST theory claims that the tracking aspect of MOT is automatic and non-attentional, though others view it as illustrating split attention, see reference 6].

Figure 1: Sequence of events in a typical Multiple Object Tracking (MOT) experiment

The basic experiment is shown in the figure on the right. First a display of eight identical objects is shown (t=1). Then a subset of 4 “targets” are briefly flashed to make them distinctive (t=2). Following this the objects stop flashing so the “target” set becomes indistinguishable from the other objects. All objects then move in a random fashion for about 10 seconds (t=3). Then the motion stops (t=4) and the observer’s task is to indicate all the tracked objects by clicking on each one using a computer mouse.

In a typical MOT experiment the subject indicates all the selected “targets” at the end of each trial. In some studies they might, instead, judge whether a particular object, flashed at the end of the trial, was a target [from 5]. Animated Quicktime movies illustrating many of the phenomena mentioned below can be viewed at Pylyshyn_Demos, or Scholl_Demos.

The many dozens (perhaps hundreds) of experiments carried out since 1988 have shown many counter-intuitive properties of MOT. Here are a few:

Targets can be tracked even when they disappear behind an occluder and, under certain conditions, even when all objects disappear from view as in an eye blink [7-9].
Properties of targets are not encoded during MOT nor are they used in tracking. If all targets have different properties tracking is no better. When target properties change subjects do not notice [10, 11].
Not all well-defined clusters of features can be tracked: Only ones that correspond to objects as opposed to parts of objects, such as the endpoints of lines [12].
Targets are selected primarily in an automatic, involuntary and data-drive manner. But they can also be selected voluntarily, though there is evidence that this happens only by moving focal attention to each target serially [13].
Tracking appears to be non-predictive and non-attentive [8, 14].
Even when objects are correctly tracked, information associated with individual targets (e.g., their starting positions or associated labels) is poorly recalled [15].
The poor recall of target information may be due to the fact that target-target confusions are more numerous than target-nontarget confusions. The reason appears to be that nontargets are inhibited, which may prevent them from being swapped with nontargets [16].
There is evidence that the ongoing computing of correspondences for purposes of tracking individual targets in MOT is distinct from accessing the targets (e.g., in pointing them out or recording their properties) and may be nonattentive. This is suggested by the following separate findings:
- Searching while tracking shows independence of the two tasks [17],
- In a dual task (Track and monitor color change) tracking performance is not impaired, providing the monitoring response is made at the end of the trial rather than during tracking [14],
- Flashed items show signs of being involuntarily tracked in a pure monitoring task without explicit tracking [18],
- One would expect that the correspondence-tracking should be nonattentive since computing correspondence is not item-limited in apparent motion and stereovision.
  
  Figure 2: The skill exhibited in MOT experiments comes in handy in practical settings, such as team sports
Objects can be tracked even when they change direction (by up to 60 degrees) and location (by several diameters) while they are behind occluding surfaces [19].
MOT capacity increases when objects’ speed decreases. For slow tracking, subjects appear to use a strategy of tracking a group at a time while task-sharing with other groups [20]. MOT capacity is also increased with videogame practice [21].
Two simultaneous tracking tasks can be carried out as well as one if they are presented to two different cortical hemispheres [22].
If objects are kept sufficiently separated they can be tracked into the visual periphery [23], thus suggesting that MOT can be used in practical applications in sports (as illustrated) as well as in other monitoring tasks, such as in collision avoidance.
Clinical populations show different patterns of MOT and spatial memory so MOT might be useful in understanding various brain-damage impairments as well as serve as potential diagnostic instruments [24]

Figure 3: The mechanism such as those used in MOT (e.g., FINSTs) can help create more efficient robot representations that use pointers to individual objects instead of descriptions of the space (based on [3], used with permission of the publisher)

The implications of a mechanism that allows multiple-object tracking (which in one theory [2] are referred to as FINSTs) are far reaching. If perceptual representations are to be grounded in the physical world, then a causal link is essential at some point in the process. The usual sort of link that has been assumed is a semantic one – the objects that fit a particular description are the ones picked out and referred to. While this may be generally true, it cannot be the whole story since it would be circular. The symbolic description must bottom out – must be grounded – in individual objects or properties in the perceptual world. Recent evidence has suggested that the grounding is done in terms of objects rather than properties [2-4]. Having this sort of link (often referred to as a “demonstrative” reference) also helps to see how geometrical and metrical properties can be exhibited by mental representations of space even if there is no actual spatial display in the brain. The current proposal assumes that spatial properties are inherited from the location of indexed objects in concurrently perceived space, where the perceived objects are anchored by FINST pointers and associated with mental representations of objects (this is worked out in Chapter 5 of [4]).

These FINSTs may also provide a mechanism for instantiating more “situated” robot representations, whereby a robot can use indexical references instead of a detailed world-based description of the location of objects [3]. Such a robot would have a representation more like the indexical than the logical form illustrated in Figure 3.

[edit] References

Pylyshyn, Z.W. and R.W. Storm, Tracking multiple independent targets: evidence for a parallel tracking mechanism. Spatial Vision, 1988. 3(3): p. 1-19.
Pylyshyn, Z.W., Visual indexes, preconceptual objects, and situated vision. Cognition, 2001. 80(1/2): p. 127-158.
Pylyshyn, Z.W., Situating vision in the world. Trends in Cognitive Sciences, 2000. 4(5): p. 197-207.
Pylyshyn, Z.W., Things and Places: How the mind connects with the world (Jean Nicod Lectures Series). forthcoming, Cambridge, MA: MIT Press.
Pylyshyn, Z.W., Seeing and visualizing: It's not what you think. 2003, Cambridge, MA: MIT Press/Bradford Books.
Cavanagh, P. and G.A. Alvarez, Tracking multiple targets with multifocal attention. Trends in Cognitive Sciences, 2005. 9(7): p. 349-354.
Scholl, B.J. and Z.W. Pylyshyn, Tracking multiple items through occlusion: Clues to visual objecthood. Cognitive Psychology, 1999. 38(2): p. 259-290.
Keane, B.P. and Z.W. Pylyshyn, Is motion extrapolation employed in multiple object tracking? Tracking as a low-level, non-predictive function. Cognitive Psychology, 2006. 52(4): p. 346-368.
Horowitz, T.S., et al., How do we track invisible objects? Psychonomic Bulletin & Review, in press.
Scholl, B.J., Z.W. Pylyshyn, and S.L. Franconeri, When are featural and spatiotemporal properties encoded as a result of attentional allocation? Investigative Ophthalmology & Visual Science, 1999. 40(4): p. 4195.
Bahrami, B., Object property encoding and change blindness in multiple object tracking. Visual Cognition, 2003. 10(8): p. 949-963.
Scholl, B.J., Z.W. Pylyshyn, and J. Feldman, What is a visual object? Evidence from target-merging in multiple-object tracking. Cognition, 2001. 80: p. 159-177.
Pylyshyn, Z.W. and V.J. Annan, Dynamics of target selection in multiple object tracking (MOT). Spatial Vision, 2006. 19(6): p. 485–504.
Leonard, C. and Z.W. Pylyshyn, Measuring the attentional demand of multiple object tracking (MOT) [Abstract]. Journal of Vision, 2003. 3(9): p. 582a.
Pylyshyn, Z.W., Some puzzling findings in multiple object tracking (MOT): I. Tracking without keeping track of object identities. Visual Cognition, 2004. 11(7): p. 801-822.
Pylyshyn, Z.W., Some puzzling findings in multiple object tracking (MOT): II. Inhibition of moving nontargets. Visual Cognition, 2006. 14(2): p. 175-198.
Alvarez, G.A., et al., Are mutielement visual tracking and visual search mutually exclusive? Journal of Experimental Psychology: Human Perception and Performance, 2005. 31(4): p. 643-667.
Haladjian, H.H. and Z.W. Pylyshyn, Implicit multiple object tracking without an explicit tracking task. Journal of Vision, 2006. 6(6)(6): p. 773a.
Franconeri, S., Z.W. Pylyshyn, and B.J. Scholl, Spatiotemporal cues for tracking multiple objects through occlusion. Visual Cognition, 2006. 14(1): p. 100-104.
Alvarez, G.A. and S.L. Franconeri, How many objects can you attentively track?: Evidence for a resource-limited tracking mechanism. under review.
Green, C. and D. Bavelier, Enumeration versus multiple object tracking: The case of action video game players. Cognition, 2006. 101(1): p. 217-245.
Alvarez, G.A. and P. Cavanagh, Independent attention resources for the left and right visual hemifields. Psychological Science, 2005. 16(8): p. 637-643.
Franconeri, S.L., et al., Multiple-object tracking in large, wide-angle, scenes. under review.
O'Hearn, K., B. Landau, and J.E. Hoffman, Multiple object tracking in people with Williams syndrome and in normally developing children. Psychological Science, 2005. 16(11): p. 905-912.

Internal references

Valentino Braitenberg (2007) Brain. Scholarpedia, 2(11):2918.

[edit] See also

Vision, Visual Binding