A viseme is any of several speech sounds that look the same, for example when lip reading (Fisher 1968). Visemes and phonemes do not share a one-to-one correspondence. Often several phonemes correspond to a single viseme, as several phonemes look the same on the face when produced, such as /k, ɡ, ŋ/, (viseme: /k/), /t͡ʃ, ʃ, d͡ʒ, ʒ/ (viseme: /ch/), /t, d, n, l/ (viseme: /t/), and /p, b, m/ (viseme: /p/). Thus words such as pet, bell, and men are difficult for lip-readers to distinguish, as all look like /pet/. However, there may be differences in timing and duration during actual speech in terms of the visual 'signature' of a given gesture that can not be captured with a single photograph. Conversely, some sounds which are hard to distinguish acoustically are clearly distinguished by the face (Chen 2001). For example, acoustically speaking English /l/ and /r/ can be quite similar (especially in clusters, such as 'grass' vs. 'glass'), yet visual information can show a clear contrast. This is demonstrated by the more frequent mishearing of words on the telephone than in person. Some linguists have argued that speech is best understood as bimodal (aural and visual), and comprehension can be compromised if one of these two domains is absent (McGurk and MacDonald 1976).
Visemes can often be humorous, as in the phrase "elephant juice," which when lip-read appears identical to "I love you."
Applications for the study of visemes include speech processing, speech recognition, and computer facial animation.