In the mid-1950's, the first practical videotape recorders were introduced. This shortly gave rise to the need for editing content, but the nature of those machines was such that if tape movement were stopped, a picture was not and could not be displayed. This led to the invention of a time code address that was recorded on the second audio track of the tape. Times were of the form: hh:mm:ss:ff, where hh means hours, mm means minutes, ss means seconds, and ff means frames. In the days of monochrome television, there were 30 frames per second; color television in the United States modified this to 29.97 frames per second (fps). Correction of the resulting error with respect to real time was accomplished through the addition of a format called drop-frame, in which two frame addresses were omitted at the first frame of the first second of each minute not evenly divisible by 10. Outside North America, other frame rates were used, such as 25 fps in Europe. Time code has also been used at 24 fps for application to film.
In addition to time code on an audio track, which is generally referred to as longitudinal time code, other approaches have been employed, including the embedding of time code data in the television vertical interval and even the embedding of a single data bit in each of the first 80 horizontal sync pulses. Some videotape recorders were designed that provided special tracks for the recording of time code. In the move to digital television, time code has been designed into the data definitions from the start, so the notion of a time code track is moot.
The definition of this time code was formalized by the SMPTE ([Society of Motion Picture and Television Engineers]) in the SMPTE 12M standards document (SMPTE Standards documents are not available for free download.)
In the longitudinal format, SMPTE time code is contained in 80 bits of data, which are defined as:
Thirty-two bits are reserved for user data. These are often used for a reel number and date, and have also been used to contain the time code from the original master reel. They can contain anything, but bits 43 and 59 must always be set to zero.
The bits are encoded as "biphase". A zero bit has a single transition at the start of the bit period. A one bit has two transitions, at the beginning and middle of the period. This encoding is self-clocking.
Longitudinal SMPTE timecode has traditionally been recorded at a level 10dB below system reference for normal audio, to minimize both distortion and crosstalk.
The modification of frame rate from the 30Hz of monochrome television to the 29.97 of NTSC introduced a problem for time code. Since the frame rate is an unpleasant fraction, if time code counts based on 30 frames per second, the relation between time code and time of day will diverge. Drop-frame is a clever hack that minimizes the accumulating error. The method is to declare that the first frame address of the zeroth second of each minute not evenly divisible by 10 will be numbered :02. This means that from minutes 1 through 9 of each decade, two addresses are missing, and therefore in each 10 minute period, 18 frames are "dropped". Drop-frame solves the divergence of time code and time, but introduces interesting annoyances in performing simple arithmetic on drop-frame times.
Dropping 2 frame counts every minute, except every tenth minute, results in 30×0.999 = 29.97 frame/S.
The sequence when frame counts are dropped:
01:08:59:28
01:08:59:29
01:09:00:02
01:09:00:03
The sequence on every tenth minute:
01:09:59:28
01:09:59:29
01:10:00:00
01:10:00:01
VITC was invented after the development of helical-scan videotape recorders, to facilitate the reading of time code when the tape is stopped. Helical-scan recorders changed the editing process, since it is possible to see a complete image with no tape movement. This made precise edit point selection easy, but longitudinal time code can only be read while tape is moving. VITC solved this problem.
In VITC, the time code data are encoded into the actual video, but in the vertical interval, where they will not be seen in the picture.
Specialized equipment is used for generating and reading time code.
Time code generators are referenced to house sync or black burst, to maintain a precise timing reference. While there may be any number of generators in use in a facility, one of them is normally considered the master, and produces the master time code reference for that facility. Other generators may need to be synchronized to this code, or may be running on a completely different time reference, as may happen in editing situations.
A time code reader accepts the time code bit stream, and converts it to human readable form, whether for display alone, or insertion in the video as burned-in time code, or for use in an editing system.
A time code decoder performs the same task, but adds one frame to the result when reading forward (subtracting in reverse), giving an "on-time" code. As the time code for a given frame address is written on that frame, by the time the address has been read, it is inherently one frame off, as the code read is for the frame just passed. In this context, "decode" refers to applying that one frame correction.
Time code is valuable in locating specific frames in a video element. However, in editing, it is essential to add and subtract time code values. The complexities of that process are reduced by translating addresses in hh:mm:ss:ff format to integer frame counts. The arithmetical manipulations are then applied to simple integers, and the result translated back to the appropriate time code format. This practice continues in non-linear editing environments, where the medium is not tape.