Gestalt principles, or gestalt laws, are rules of the organization of perceptual scenes. When we look at the world, we usually perceive complex scenes composed of many groups of objects on some background, with the objects themselves consisting of parts, which may be composed of smaller parts, etc. How do we accomplish such a remarkable perceptual achievement, given that the visual input is, in a sense, just a spatial distribution of variously colored individual points? The beginnings and the direction of an answer were provided by a group of researchers early in the twentieth century, known as Gestalt psychologists. Gestalt is a German word meaning 'shape' or 'form'. Gestalt principles aim to formulate the regularities according to which the perceptual input is organized into unitary forms, also referred to as (sub)wholes, groups, groupings, or Gestalten (the plural form of Gestalt). These principles mainly apply to vision, but there are also analogous aspects in auditory and somatosensory perception. In visual perception, such forms are the regions of the visual field whose portions are perceived as grouped or joined together, and are thus segregated from the rest of the visual field. The Gestalt principles were introduced in a seminal paper by Wertheimer (1923/1938), and were further developed by Köhler (1929), Koffka (1935), and Metzger (1936/2006; see review by Todorović, 2007). For a modern textbook presentation, including more recent contributions, see Palmer (1999).
If the visual field is homogeneous throughout, a situation labeled as Ganzfeld (German for 'whole field'), it has no consistent internal organization. A simple case of an inhomogeneous field is a display with a patch of one color surrounded by another color, as in Figure 1.
In such cases the visual field is perceived as articulated into two components, the figure (patch) on the ground (surround). This figure-ground articulation may seem obvious, but it is not trivial. This type of field organization has a number of remarkable features, first described in the work of Rubin (1915/1921), predating Wertheimer's publication. The two components are perceived as two segments of the visual field differing not only in color, but in some other phenomenal characteristics as well. The figure has an object-like character, whereas the ground has less perceptual saliency and appears as 'mere' background. The areas of the figure and the ground usually do not appear juxtaposed in a common plane, as in a mosaic, but rather as stratified in depth: there is a tendency to see the figure as positioned in front, and the ground at a further depth plane and continuing to extend behind the figure, as if occluded by it. Furthermore, the border separating the two segments is perceived as belonging to the figure rather than to the ground, and as delineating the figure's shape as its contour, whereas it is irrelevant to the shape of the ground. Certain displays are bi-stable, in that what is perceived as figure can also be perceived as ground and vice-versa. However, in displays structured such as Figure 1, in which a smaller region is wholly surrounded by a larger region, it is usually the former that appears as figure (although it may also be seen as a hole), and the latter as ground.
The described organization of the display into the figure and the ground is not its only conceivable segmentation. To illustrate this, consider that Figure 1, as presented on the computer screen, is a set composed of a certain number of pixels, and that the segmentation into figure and ground corresponds to a particular partition of this set into two subsets. However, this same set may be partitioned into a huge number of other pairs of subsets (such as the subset of pixels in the left half of the figure and the subset in the right half, or the subset at one side of any arbitrary line meandering through the display and the subset at the other side, or the subset consisting of even pixels in odd rows plus odd pixels in even rows and the complementary subset), or into any conceivable three subsets, or four subsets etc. Nevertheless, while an enormous number of such alternative partitions are conceivable, none of them is perceivable, save one or very few. The partition that is actually seen is not a matter of geometric combinatorics and attention to arbitrarily selected subsets: the natural, and often the only way that we can perceive such a display, given the structure of the visual input, is as segmented into the figure and the ground. Such articulation, in which a virtual infinity of geometrical possibilities is pruned down to a single or only a couple of perceptual realizations, is a very basic feature of the working of the visual system.
Although figure-ground perception is a fundamental aspect of field organization, it is not usually itself referred to as a Gestalt law or principle of grouping. Rather, such terms are mostly used for describing the rules of the organization of somewhat more complex visual fields. There is no definitive list of Gestalt principles, but some of the most commonly discussed are listed and described below, illustrated with examples mainly based on Wertheimer (1923/1938) and Metzger (1936/2006). As demonstrated by these examples, the perceptual groupings are in some cases strong and unambiguous, but in other cases they are better described as tendencies, especially when different factors compete with each other.
Figure 2a contains six patches,each of which is perceived as a visual unit, a figure on a common ground. However, they are also collectively the elements of a higher-order visual unit, the horizontal row. According to Gestalt theory, this type integration of individual components into a superordinate whole can be accounted for by the proximity principle: elements tend to be perceived as aggregated into groups if they are near each other.
The effect of varying proximity is illustrated in Figure 2b. Due to the change of distance between some of the components, here the patches are perceived not just collectively as a sextuple, but also as being subdivided into a triple of doublets, an organization that in Wertheimer's notation is designated as 12/34/56.
Note that a number of other potential partitions of the set in Figure 2b exist, such as into a doublet of triples (123/456), or into a quartet and a pair (1234/56), or even into combinations of non-adjacent items such as 16/25/34/, or 135/246 etc. However, it is extremely hard, if not impossible, to actually perceive groupings of patches other than 12/34/56 in this figure. On the other hand, it is not impossible to see some subdivisions in Figure 2a. For example, with deliberate effort and concentrated attention one may eventually succeed in mentally partitioning the row of patches into three pairs. However, such a percept is usually only partially and locally successful (one clearly sees only one or two segregated pairs), appears contrived, and is fleeting. In contrast, perceiving the same partition in Figure 2b is spontaneous and effortless, and the percept is global and stable. Attention may contribute to figural perception, but, except in special cases, its role is usually limited: generally, it is not attention that creates the forms, but rather the forms, organized in accord with Gestalt principles, that draw attention.
With a different spatial distribution of the six components, such as in Figure 2c, another naturally perceived partition into sub-wholes arises, denoted as 1/23/45/6. The partition 12/34/56, although arguably simpler and more regular, is hard to perceptually realize in Figure 2c: it would violate the proximity principle, as it would involve grouping together some elements across relatively larger distances, but assigning other, relatively near elements, into different groups.
The common fate principle states that elements tend to be perceived as grouped together if they move together. Thus if some of the elements in Figure 2 would begin to displace they would be perceived as a group, even across larger distances. This is shown in Figure 3, in the following manner. If you move the cursor within the area of this figure, some of the patches will move up some distance, and if you then click on the left mouse button, they will move down. Repeatedly pressing and releasing the left mouse button provides a simple demonstration of the grouping power of the common fate principle.
Figure 3: Common fate principle.
|
The similarity principle claims that elements tend to be integrated into groups if they are similar to each other. It is illustrated in Figure 3a-e, in which proximity is held constant, since the individual figures are at (approximately) the same distance from each other, as in Figure 2a. Nevertheless, they are perceptually partitioned into three adjacent pairs, due to the similarity of visual attributes such as lightness (Figure 3a), color (Figure 3b), size (Figure 3c), orientation (Figure 3d), or shape (Figure 3e).
The 12/34/56 partition becomes more salient when the within-group similarities and between-group differences are compounded, by making the doublets similar / different in more than one visual attribute ( Figure 3f). An important manipulation, studied already by Wertheimer (1923), is to vary both similarity and proximity, in order to investigate their joint effects on perceived groupings. Note that by increasing the distance between elements 2 and 3, and elements 4 and 5 (as in Figure 2b), the salience of the 12/34/56 organization is strengthened (Figure 3g), since similarity and proximity co-operate by favoring the same organization. On the other hand, when the inter-element distances are changed as in Figure 2c, the resulting perceptual organization, Figure 3h, is less clear, because similarity still favors partition 12/34/56, but proximity favors partition 1/23/45/6. This type of manipulation can thus be used to quantify the effects of different Gestalt principles and compare their strength.
The display in Figure 4a can be described as consisting of a number of elements arranged in three sub-wholes or branches, converging at X. According to the principle of proximity, one would expect branch BX to group with branch CX, but instead it groups with branch AX, forming the sub-whole AXB.
This grouping is an instance of the continuity principle: oriented units or groups tend to be integrated into perceptual wholes if they are aligned with each other. The principle applies in the same way for elements arranged along lines (Figure 4a) as well as for patterns built from corresponding lines themselves (Figure 4b). The balance between continuity and proximity in the formation of salient sub-wholes may be shifted by varying similarity, which can be accomplished by coloring different branches differently. Thus coloring BX same as AX but different from CX makes AXB a still more salient unit (Figure 4c), whereas coloring BX same as CX but different than AX tends to increase the saliency of CXB (Figure 4d).
Figure 5a-b is constructed by adding some appropriate elements to Figure 4a-b. Whereas in Figure 4a and Figure 4b the component BX is grouped with AX, in Figure 5a and Figure 5b there is a tendency for this component to rather group with CX, both BX and CX being sides of shape BCX, which itself constitutes one half of a bow-tie shaped figure. This is an instance of the closure principle: elements tend to be grouped together if they are parts of a closed figure. However, in this particular example, continuity is still relatively effective, and is in strong competition with closure. Using similarity, the salience of BCX as a visual sub-whole can be increased, as in Figure 5c, or decreased, as in Figure 5d.
Note that the patterns in Figure 4a and Figure 4b, although physically contained in Figure 5a and Figure 5b, are hard to see there: they can be sought out with directed attention, but do not appear spontaneously as natural visual wholes. The reason for this is not simply that more elements are added in the display. This is demonstrated in Figure 7, in which the pattern in a is readily discernible in b in spite of many added elements, but is practically invisible in c, d, and e, although geometrically it is just as present there (and in the same place) as in a and b. The loss of the visual identity of the pattern is due to the effectiveness of the Gestalt principles, mainly continuity and closure, according to which its elements are perceptually integrated with other present elements, and assigned to other, new visual wholes. One way in which its visual identity can be recovered is by simply changing its color to make it dissimilar from the surround. For a demonstration, position the cursor anywhere within the area of Figure 7. Note also that when the cursor is removed from the figure and the pattern again assumes the same color as the added elements, it quickly (though not necessarily instantaneously) fades from view, and no effort of attention can restore it to a salient visual whole. For a further demonstration, hold the left mouse button depressed while positioned within the area of the figure, which will remove the pattern and reveal only the added elements. A classical study of such 'hidden figure' effects was reported by Gottschaldt (1926).
Figure 7: Camouflage.
|
These examples are instances of camouflage, the phenomenon in which objects are hidden from view but not by being occluded: instead, they are perceptually subdivided (broken up internally) and repartitioned, that is, their parts are grouped with parts of the surrounding environment. As used by animals in the struggle for survival and by humans in warfare, the power of Gestalt principles thus makes it possible for organisms and things which are in plain sight to become effectively invisible and therefore undetectable by adversaries. Thus whether a physical object that is optically present exists or does not exist visually, depends on the interplay of perceptual laws.
The pattern in Figure 6a is readily partitioned into two components, a straight line and a wavy line that cross each other. This perceptual decomposition is strengthened by similarity (Figure 6b). An alternative decomposition of Figure 6a into two abutting corners, depicted in Figure 6c, does not seem to arise spontaneously; this can be explained by noting that it would violate the continuity principle. However, an appeal to continuity does not explain why the partition in Figure 6d does not spontaneously arise easily in Figure 6a either, although both of its components are continuous lines.
In another, related example, Figure 7a spontaneously decomposes into a semi-wheel with curved cogs touching a rectangular 'snake'. However, this perceptual outcome actually violates the continuity principle, because at the point at which the two components touch, this decomposition involves angles, instead of following the directions of the crossing continuous lines. An even clearer decomposition is achieved by introducing similarity as well (Figure 7b). However, similarity can also be used to enhance a radically different decomposition into two crossing twisted threads, favored by continuity, as indicated in Figure 7c.
According to the Gestalt viewpoint, the dominant percepts in Figure 6a and Figure 7a are instances of the good Gestalt principle: elements tend to be grouped together if they are parts of a pattern which is a good Gestalt, meaning as simple, orderly, balanced, unified, coherent, regular, etc as possible, given the input. In this sense, the straight line and the wavy line perceived in Figure 6a are better forms than the pairs of lines in Figure 6c and Figure 6d, and in Figure 7a the cog wheel and the snake are better forms than the hybrid shapes in Figure 7c, that would be generated in Figure 7a by conforming to the continuity principle at the crossing point. In such cases global regularity takes precedence over local relations. This principle is also called the 'law of good form' or the 'law of Prägnanz', a German word that translates roughly as salience, incisiveness, conciseness, impressiveness, or orderliness.
In some cases the visual input is organized according to the past experience principle: elements tend to be grouped together if they were together often in the past experience of the observer. For example, we tend to perceive the pattern in Figure 8a as a meaningful word, built up from strokes which are grouped to form particular letters of the Roman alphabet (such as 'm', 'i', 'n', etc). Note that the individual letters are rather clearly and distinctly perceived as 'natural' parts of the connected figure, and are only slightly easier to discern and discriminate if further individuated through separation (Figure 8b) or coloration (Figure 8c). However, in addition to this standard segmentation into letters, the pattern Figure 8a has many other alternate partitions, such as the one demonstrated through separation and coloration in Figure 8d and Figure 8e. But, in contrast to the standard segmentation, discerning and discriminating these alternate components (some of which are 'non-letters') within Figure 8a is a cumbersome task, similar to the laborious search for the hidden shape in Figures 6c-e; furthermore, the standard segmentation is to some extent perceivable even in Figure 8e, where it competes with the segmentation based on the similarity principle. The spontaneity and ease of the standard, dominantly perceived organization of the strokes into letters, is plausibly mainly due to past experience, that is, to our familiarity with words as written in the script form of the Roman alphabet. This particular organization might not occur for observers lacking such familiarity; furthermore, the alternate partition would presumably be natural for observers used to an alphabet whose letters would correspond to the sub-wholes in Figure 8d and Figure 8e. Note also that in print perhaps the most potent Gestalt principle is proximity: simply inserting larger blank spaces between words than between letters (a device not used in antiquity) helps group together the letters correctly, and establish words as the salient visual units in the text. The importance of blank spaces is demonstrated by the difficulty wehavewhenreadingtextnotseparatedbyblanks an dev enmor ew henbl an kspa cesap pea rinwr ongpl aces.
Although acknowledged by the gestaltists, the experience-based principle was deemed of secondary importance, compared with the other, stimulus-based principles, and easily dominated by them. As an example, in the pattern in Figure 8f, in which a slightly overlapping inverted version is added, the original stimulus is much harder to see, due to the appearance of numerous new salient sub-patterns, generated by continuity and closure.
Similar as in vision, issues of organization, grouping, and segmentation arise in the auditory domain as well (Bregman, 1990; Kubovy & van Valkenburg, 2001). The acoustic input is just a one-dimensional temporally varying air pressure waveform, but based on it we can perceive an auditory scene involving multiple sources of human speech, vocal and instrumental music, animal sounds and other nature noises, occasionally all occurring at the same time, each with its own sub-phrasing and structure. Some visual Gestalt principles directly apply in the acoustic domain, but mainly in a temporal rather than spatial form. For example, silence or background noise, interrupted by a loud sound, followed again by silence or noise, is an auditory analogue of a figure on a ground. Similarly, a regular series of identical short clicks is an analogue of Figure 2a, with equal temporal intervals between sound events playing the role of equal spatial distances. With deliberate attention, one can mentally superimpose a structure on this sequence, such as hearing consecutive pairs of clicks, as in 12/34/56. However, such a phenomenal segmentation is achieved much more naturally and easily by simply increasing the intervals between some clicks, analogously to Figure 2b. This is an instance of an auditory temporal analogue of the visual spatial proximity principle; there is also a spatial auditory variant, involving pairs of identical sounds separated by equal intervals, but coming from different directions, such as left, left/in front, in front/right, right. Auditory analogues of instances of the visual similarity principle, as illustrated in Figure 3, are also readily established, but with differences and similarities of color, size etc being replaced by differences and similarities of loudness, pitch, and timbre of sounds. Auditory analogues of some other Gestalt principles may also be constructed.
The principles described above, together with others not illustrated here, such as the symmetry principle (symmetrical components will tend to group together), the convexity principle (convex rather than concave patterns will tend to be perceived as figures), and others, are part of the classical heritage of perception studies. In contemporary research, of which only a few examples will be noted below, the seminal insights and issues raised by the gestaltists are developed and extended in various directions.
For example, contrary to the classical views, more recent research has indicated that even such a basic feature as figure-ground articulation may in some instances be based on experience (Peterson & Skow-Grant, 2003). For example, although in displays with two homogeneous regions, neither of which surrounds the other, assignment to figure and ground is often ambiguous, in some cases in which one region resembles an object, such as a tree in Figure 9, that region is preferably perceived as figure.
Palmer and colleagues have developed some new principles of visual field organization. For example, Palmer (1992) has proposed the common region principle: elements tend to be grouped together if they are located within the same closed region. An illustration is provided in Figure 10a. It depicts the same spatial distribution of elements which, in Figure 2c, elicited the grouping 1/23/45/6; however, with superimposed closed contours the preferred grouping becomes 12/34/56.
Palmer & Rock (1994) proposed the element connectedness principle: elements tend to be grouped together if they are connected by other elements. This principle is illustrated in Figure 10b. Like Figure 10a, Figure 10b is also based on Figure 2c, but, due to some elements being connected, the preferred perceived grouping is 12/34/56.
Researchers have also presented computational models of some Gestalt principles (Kubovy & van der Berg, 2008), studied their possible neural bases (Sasaki, 2007; Han et al., 2005; Qiu & von der Heydt, 2005; Roelfsema, 2006), and attempted to relate them to natural image statistics (Geisler et al., 2001; Elder & Goldberg, 2002).
As formulated by Wertheimer, Gestalt principles involve a 'ceteris paribus' (all other things being equal) clause (Palmer, 1999). That is, each principle is supposed to apply given that the other principles do not apply or are being held constant. In case two (or more) principles apply for the same input, and they favor the same grouping, it will tend to become strengthened; however, if they disagree, usually one wins or the organization of the percept is unclear. Several examples of the domination of one principle over another are presented above. However, although it has been addressed to some extent in the literature (e.g. see Kubovy & van der Berg, 2008), the significant theoretical problem of how to predict which principle will win in which circumstances remains to be worked out in much more detail.
Gestalt principles are usually illustrated with rather simple drawings, such as those above. Ideally, it should be possible to apply them to an arbitrarily complex image and, as a result, produce a hierarchical parsing of its content that corresponds to our perception of its wholes and sub-wholes. This ambitious goal is yet to be accomplished.
It has been suggested that most Gestalt principles are special instances of the overarching Good Gestalt principle, in the sense that being continuous, closed, similar etc are ways of being maximally good, ordered, simple etc. However, although this idea achieves some explanatory economy and unity, it does so at the cost of clarity and operationalizability: whereas it may be relatively simple to point out the presence of continuity, closure, etc, it is more difficult to establish what exactly makes a pattern visually good, simple, unified etc.
One important issue which was not discussed much in classical literature is the origin of Gestalt principles. Why is it that the perceptual input is organized in accord with proximity, continuity, closure etc? The gestaltists tended to favor the notion that these principles are among the fundamental properties of the perceptual system, providing the basis of our ability to make sense of the sensory signals. An opposed view is that the Gestalt principles are heuristics derived from some general features of the external world, based on our experience with things and their properties (Rock, 1975): objects in the world are usually located in front of some background (figure-ground articulation), have an overall texture different from the texture of the background (similarity), consist of parts which are near each other (proximity), move as a whole (common fate), and have closed contours (closure) which are continuous (continuity). In sum, although these principles have been discussed for more than 80 years and are presented in most perception textbooks, there are still a number of issues about them that need to be resolved.
Internal references
Figure-ground perception, Visual search, Binding by synchrony, Vision, Self-organization of brain function