From Handwiki In combinatorics, a branch of mathematics, the autocorrelation of a word is the set of periods of this word. More precisely, it is a sequence of values which indicate how much the end of a word looks likes the beginning of a word. This value can be used to compute, for example, the average value of the first occurrence of this word in a random string.
In this article, A is an alphabet, and [math]\displaystyle{ w=w_1\dots w_n }[/math] a word on A of length n. The autocorrelation of [math]\displaystyle{ w }[/math] can be defined as the correlation of [math]\displaystyle{ w }[/math] with itself. However, we redefine this notion below.
The autocorrelation vector of [math]\displaystyle{ w }[/math] is [math]\displaystyle{ c=(c_0,\dots,c_{n-1}) }[/math], with [math]\displaystyle{ c_i }[/math] being 1 if the prefix of length [math]\displaystyle{ n-i }[/math] equals the suffix of length [math]\displaystyle{ n-i }[/math], and with [math]\displaystyle{ c_i }[/math] being 0 otherwise. That is [math]\displaystyle{ c_i }[/math] indicates whether [math]\displaystyle{ w_{i+1}\dots w_n=w_1\dots w_{n-i} }[/math].
For example, the autocorrelation vector of [math]\displaystyle{ aaa }[/math] is [math]\displaystyle{ (1,1,1) }[/math] since, clearly, for [math]\displaystyle{ i }[/math] being 0, 1 or 2, the prefix of length [math]\displaystyle{ n-i }[/math] is equal to the suffix of length [math]\displaystyle{ n-i }[/math]. The autocorrelation vector of [math]\displaystyle{ abb }[/math] is [math]\displaystyle{ (1,0,0) }[/math] since no strict prefix is equal to a strict suffix. Finally, the autocorrelation vector of [math]\displaystyle{ aabbaa }[/math] is 100011, as shown in the following table:
| a | a | b | b | a | a | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| a | a | b | b | a | a | 1 | |||||
| a | a | b | b | a | a | 0 | |||||
| a | a | b | b | a | a | 0 | |||||
| a | a | b | b | a | a | 0 | |||||
| a | a | b | b | a | a | 1 | |||||
| a | a | b | b | a | a | 1 |
Note that [math]\displaystyle{ c_0 }[/math] is always equal to 1, since the prefix and the suffix of length [math]\displaystyle{ n }[/math] are both equal to the word [math]\displaystyle{ w }[/math]. Similarly, [math]\displaystyle{ c_{n-1} }[/math] is 1 if and only if the first and the last letters are the same.
The autocorrelation polynomial of [math]\displaystyle{ w }[/math] is defined as [math]\displaystyle{ c(z)=c_0z^0+\dots+c_{n-1}z^{n-1} }[/math]. It is a polynomial of degree at most [math]\displaystyle{ n-1 }[/math].
For example, the autocorrelation polynomial of [math]\displaystyle{ aaa }[/math] is [math]\displaystyle{ 1+z+z^2 }[/math] and the autocorrelation polynomial of [math]\displaystyle{ abb }[/math] is [math]\displaystyle{ 1 }[/math]. Finally, the autocorrelation polynomial of [math]\displaystyle{ aabbaa }[/math] is [math]\displaystyle{ 1+z^4+z^5 }[/math].
We now indicate some properties which can be computed using the autocorrelation polynomial.
Suppose that you choose an infinite sequence [math]\displaystyle{ s }[/math] of letters of [math]\displaystyle{ A }[/math], randomly, each letter with probability [math]\displaystyle{ \frac{1}{|A|} }[/math], where [math]\displaystyle{ |A| }[/math] is the number of letters of [math]\displaystyle{ A }[/math]. Let us call [math]\displaystyle{ E }[/math] the expectation of the first occurrence of ?[math]\displaystyle{ m }[/math] in [math]\displaystyle{ s }[/math]? . Then [math]\displaystyle{ E }[/math] equals [math]\displaystyle{ |A|^nc\left(\frac{1}{|A|}\right) }[/math]. That is, each subword [math]\displaystyle{ v }[/math] of [math]\displaystyle{ w }[/math] which is both a prefix and a suffix causes the average value of the first occurrence of [math]\displaystyle{ w }[/math] to occur [math]\displaystyle{ |A|^{|v|} }[/math] letters later. Here [math]\displaystyle{ |v| }[/math] is the length of v.
For example, over the binary alphabet [math]\displaystyle{ A=\{a,b\} }[/math], the first occurrence of [math]\displaystyle{ aa }[/math] is at position [math]\displaystyle{ 2^2(1+\frac 12)=6 }[/math] while the average first occurrence of [math]\displaystyle{ ab }[/math] is at position [math]\displaystyle{ 2^2(1)=4 }[/math]. Intuitively, the fact that the first occurrence of [math]\displaystyle{ aa }[/math] is later than the first occurrence of [math]\displaystyle{ ab }[/math] can be explained in two ways:
Autocorrelation polynomials allows to give simple equations for the ordinary generating functions (OGF) of many natural questions.
![]() |
Categories: [Formal languages]