Formal languages and automata

Both natural and programming languages can be viewed as sets of sentences, that is, finite strings of elements from some basic vocabulary. The notion of a language introduced below is very general. It certainly includes both natural and programming languages and also all kinds of nonsense languages one might think of. Traditionally, formal language theory is concerned with the syntactic specification of a language rather than with any semantic issues. A syntactic specification of a language with finitely many sentences can be given, at least in principle, by listing the sentences. This is not possible for languages with infinitely many sentences. The main task of formal language theory is the study of finitary specifications of infinite languages.

The basic theory of computation, as well as of its various branches, such cryptography, is inseparably connected with language theory. The input and output sets of a computational device can be viewed as languages, and — more profoundly — models of computation can be identified with classes of language specifications, in a sense to be made more precise. Thus, for instance, Turing machines can be identified with phrase-structure grammars and finite automata with regular grammars (cf. also Turing machine; Automaton, finite; Grammar, formal).

Formal language theory is — together with automata theory, (cf. Automata, theory of) which is really inseparable from language theory — the oldest branch of theoretical computer science. In some sense, the role of language and automata theory in computer science is analogous to that of philosophy in general science: it constitutes the stem from which the individual branches of knowledge emerge.

An alphabet is a finite non-empty set. The elements of an alphabet $ V $ are called letters. A word over an alphabet $ V $ is a finite string of letters (cf. also Word). The word consisting of zero letters is called the empty word, written $ \lambda $. The set of all words (respectively, all non-empty words) over an alphabet $ V $ is denoted by $ V^{*} $( respectively, $ V^{+} $). (Thus, algebraically $ V^{*} $ and $ V^{+} $ are the free monoid and free semi-group generated by the finite set $ V $.) For words $ w _{1} $ and $ w _{2} $, the juxtaposition $ w _{1} w _{2} $ is called the catenation of $ w _{1} $ and $ w _{2} $. The empty word $ \lambda $ is an identity with respect to catenation. Catenation being associative, the notation $ w^{i} $, where $ i $ is a non-negative integer, is used in the customary sense, and $ w^{0} $ denotes the empty word. The length of a word $ w $, in symbols $ \mathop{\rm lg}\nolimits (w) $, also sometimes $ | w | $, means the number of letters in $ w $ when each letter is counted as many times as it occurs. A word $ w $ is a subword of a word $ u $ if and only if there are words $ w _{1} $ and $ w _{2} $ such that $ u = w _{1} ww _{2} $( cf. also Imbedded word). Subsets of $ V^{*} $ are referred to as (formal) languages over the alphabet $ V $.

Various unary and binary operations defined for languages will be considered in the sequel. Regarding languages as sets, one may immediately define the Boolean operations of union, intersection, complementation (the complementation of a language $ L $ is denoted by $ L^{c} $), difference, and symmetric difference in the usual manner. The catenation (or product) of two languages $ L _{1} $ and $ L _{2} $, in symbols $ L _{1} L _{2} $, is defined by

$$ L _{1} L _{2} \ = \ \{ {w _{1} w _ 2} : { w _{1} \in L _{1} \ \textrm{ and } \ w _{2} \in L _ 2} \} . $$

The notation $ L^{i} $ is extended to the catenation of languages. By definition $ L^{0} = \{ \lambda \} $. The catenation closure or Kleene star (respectively, $ \lambda $- free catenation closure) of a language $ L $, in symbols $ L^{*} $( respectively, $ L^{+} $), is defined to be the union of all non-negative powers of $ L $( respectively, the union of all positive powers of $ L $).

The operation of substitution. For each letter $ a $ of an alphabet $ V $, let $ \sigma (a) $ denote a language (possibly over a different alphabet). Define, furthermore,

$$ \sigma ( \lambda ) \ = \ \{ \lambda \} ,\ \ \sigma (w _{1} w _{2} ) \ = \ \sigma (w _{1} ) \sigma (w _{2} ),\ \ \textrm{ for } \ w _{1} ,\ w _{2} \in V^{*} . $$

For a language $ L $ over $ V $, one defines

$$ \sigma (L) \ = \ \{ {u} : {u \in \sigma (w) \ \textrm{ for } \ \textrm{ some } \ w \in L} \} . $$

Such a mapping $ \sigma $ is called a substitution. A substitution $ \sigma $ such that each $ \sigma (a) $ consists of a single word is called a homomorphism. (Algebraically, a homomorphism of languages is a homomorphism of monoids linearly extended to subsets of monoids.) In connection with homomorphisms (and also often elsewhere) one identifies a word $ w $ with the singleton set $ \{ w \} $, writing $ \sigma (a) = w $ rather than $ \sigma (a) = \{ w \} $.

The main objects of study in formal language theory are finitary specifications of infinite languages. Most of such specifications are obtained as special cases from the notion of a rewriting system. By definition, a rewriting system is an ordered pair $ (V,\ P) $, where $ V $ is an alphabet and $ P $ a finite set of ordered pairs of words over $ V $. The elements $ (w,\ u ) $ of $ P $ are referred to as rewriting rules or productions, and are denoted by $ w \rightarrow u $. Given a rewriting system, the yield relation $ \Rightarrow $ in the set $ V^{*} $ is defined as follows. For any two words $ w $ and $ u $, $ w \Rightarrow u $ holds if and only if there are words $ w^ \prime $, $ w _{1} $, $ w^{\prime\prime} $, $ u _{1} $ such that $ w = w^ \prime w _{1} w^{\prime\prime} $, $ u = w^ \prime u _{1} w^{\prime\prime} $ and $ w _{1} \rightarrow u _{1} $ is a production in the system. The reflexive transitive closure (respectively, transitive closure) of the relation $ \Rightarrow $ is denoted by $ \Rightarrow^{*} $( respectively $ \Rightarrow^{+} $).

A phrase-structure grammar, shortly a grammar, is an ordered quadruple $ G = (V _{N} ,\ V _{T} ,\ S,\ P) $, where $ V _{N} $ and $ V _{T} $ are disjoint alphabets (the alphabets of non-terminals and terminals), $ S \in V _{N} $( the initial letter) and $ P $ is a finite set of ordered pairs $ (w , u ) $ such that $ u $ is a word over the alphabet $ V = V _{N} \cup V _{T} $ and $ w $ is a word over $ V $ containing at least one letter of $ V _{N} $. Again, the elements of $ P $ are referred to as rewriting rules or productions, and are written as $ w \rightarrow u $. A phrase-structure grammar $ G $ as above defines a rewriting system $ (V,\ P) $. Let $ \Rightarrow $ and $ \Rightarrow^{*} $ be the relations determined by this rewriting system. Then the language $ L (G) $ generated by $ G $ is defined by

$$ L (G) \ = \ \{ {w \in V _{T} ^ *} : {S \Rightarrow^{*} w} \} . $$

Two grammars $ G $ and $ G _{1} $ are termed equivalent if and only if $ L (G) = L (G _{1} ) $. For $ i = 0,\ 1,\ 2,\ 3 $, a grammar $ G = (V _{N} ,\ V _{T} ,\ S,\ P) $ is of the type $ i $ if and only if the restrictions $ i $) on $ P $, as given below, are satisfied:

0) No restrictions.

1) Each production in $ P $ is of the form $ w _{1} Aw _{2} \rightarrow w _{1} ww _{2} $, where $ w _{1} $ and $ w _{2} $ are arbitrary words, $ A \in V _{N} $ and $ w $ is a non-empty word (with the possible exception of the production $ S \rightarrow \lambda $ whose occurrence in $ P $ implies, however, that $ S $ does not occur on the right-hand side of any production).

2) Each production in $ P $ is of the form $ A \rightarrow w $, where $ A \in V _{N} $.

3) Each production is of one of the two forms $ A \rightarrow Bw $ or $ A \rightarrow w $, where $ A,\ B \in V _{N} $ and $ w \in V _ T^{*} $.

A language is of type $ i $ if and only if it is generated by a grammar of type $ i $. Type-0 languages are also called recursively enumerable. Type-1 grammars and languages are also called context-sensitive. Type-2 grammars and languages are also called context-free. Type-3 grammars and languages are also referred to as finite-state or regular. (See also Grammar, context-sensitive; Grammar, context-free.)

Type- $ i $, $ i = 0,\ 1,\ 2,\ 3 $, languages form a strict hierarchy: the family of type- $ i $ languages is strictly included in the family of type- $ (i - 1) $ languages, for $ i = 1,\ 2,\ 3 $.

A specific context-free language over an alphabet with $ 2t $ letters, $ V _{t} = \{ a _{1} \dots a _{t} ,\ \overline{a}\; _{1} \dots \overline{a}\; _{t} \} $, $ t \geq 1 $, is the Dyck language generated by the grammar

$$ ( \{ S \} ,\ V _{t} ,\ S,\ \{ S \rightarrow SS,\ S \rightarrow \lambda ,\ S \rightarrow a _{1} S \overline{a}\; _{1} \dots S \rightarrow a _{t} S \overline{a}\; _{t} \} ). $$

The Dyck language consists of all words over $ V _{t} $ that can be reduced to $ \lambda $ using the relations $ a _{i} \overline{a}\; _{i} = \lambda $, $ i = 1 \dots t $. If the pairs $ (a _{i} ,\ \overline{a}\; _{i} ) $, $ i = 1 \dots t $, are viewed as parentheses of different types, then the Dyck language consists of all sequences of correctly nested parentheses.

The family of regular languages over an alphabet $ V $ equals the family of languages obtained from "atomic languages" $ \{ \lambda \} $ and $ \{ a \} $, where $ a \in V $, by a finite number of applications of "regular operations" : union, catenation and catenation closure. The formula expressing how a specific regular language is obtained from atomic languages by regular operations is termed a regular expression.

Every context-free language which is $ \lambda $- free (i.e., does not contain the empty word) is generated by a grammar in Chomsky normal form, as well as by a grammar in Greibach normal form. In the former, all productions are of the types $ A \rightarrow BC $, $ A \rightarrow a $, and in the latter — of the types $ A \rightarrow aBC $, $ A \rightarrow aB $, $ A \rightarrow a $, where capital letters denote non-terminals and $ a $ is a terminal letter. According to the theorem of Chomsky–Schützenberger, every context-free language $ L $ can be expressed as

$$ L \ = \ h (D \cap R), $$

for some regular language $ R $, Dyck language $ D $ and homomorphism $ h $. According to the Lemma of Bar-Hillel (also called the pumping lemma), every sufficiently long word $ w $ in a context-free language $ L $ can be written in the form

$$ w \ = \ u _{1} w _{1} u _{2} w _{2} u _{3} ,\ \ w _{1} w _{2} \ \neq \ \lambda , $$

where for every $ i \geq 0 $ the word $ u _{1} w _ 1^{i} u _{2} w _ 2^{i} u _{3} $ belongs to $ L $. For regular languages, the corresponding result reads as follows: Every sufficiently long word $ w $ in a regular language $ L $ can be written in the form $ w = u _{1} w _{1} u _{2} $, $ w _{1} \neq \lambda $, where for every $ i \geq 0 $ the word $ u _{1} w _ 1^{i} u _{2} $ belongs to $ L $.

Derivations according to a context-free grammar (i.e., finite sequences of words where every two consecutive words are in the relation $ \Rightarrow $) can in a natural way be visualized as labelled trees, the so-called derivation trees (cf. also Derivation tree). A context-free grammar $ G $ is termed ambiguous if and only if some word in $ L (G) $ has two derivation trees. Otherwise, $ G $ is termed unambiguous. A context-free language $ L $ is unambiguous if and only if $ L = L (G) $ for some unambiguous grammar $ G $. Otherwise, $ L $ is termed (inherently) ambiguous. One also speaks of degrees of ambiguity. A context-free grammar $ G $ is ambiguous of degree $ k $( a natural number or $ \infty $) if and only if every word in $ L (G) $ possesses at most $ k $ derivation trees and some word in $ L (G) $ possesses exactly $ k $ derivation trees. A language $ L $ is ambiguous of degree $ k $ if and only if $ L = L (G) $ for some $ G $ ambiguous of degree $ k $, and there is no $ G _{1} $ ambiguous of degree less than $ k $ such that $ L = L (G _{1} ) $.

The families of type- $ i $ languages, $ i = 0,\ 1,\ 2,\ 3 $, defined above using generative devices, can be obtained also by recognition devices. A recognition device defining a language $ L $ receives arbitrary words as inputs and "accepts" exactly the words belonging to $ L $. The recognition devices finite and pushdown automata, corresponding to type-3 and type-2 languages, will be defined below. The definitions of the recognition devices linear bounded automata and Turing machines, corresponding to type-1 and type-0 languages, are analogous. (Cf. Computable function; Complexity theory.)

A rewriting system $ (V,\ P) $ is called a finite deterministic automaton if and only if: a) $ V $ is divided into two disjoint alphabets $ V _{s} $ and $ V _{I} $( the state and the input alphabet); b) an element $ s _{0} \in V _{s} $ and a subset $ S _{1} \subseteq V _{s} $ are specified (initial state and final state set); and c) the productions in $ P $ are of the form

$$ s _{i} a _{k} \ \rightarrow \ s _{j} ,\ \ s _{i} ,\ s _{j} \in V _{s} ; \ \ a _{k} \in V _{I} , $$

and for each pair $ (s _{i} ,\ a _{k} ) $ there is exactly one such production in $ P $.

A finite deterministic automaton over an input alphabet $ V _{I} $ is usually defined by specifying an ordered quadruple $ (s _{0} ,\ V _{s} ,\ f,\ S _{1} ) $, where $ f $ is a mapping of $ V _{s} \times V _{I} $ into $ V _{s} $, the other items being as above. (Clearly, the values of $ f $ are obtained from the right-hand sides of the productions listed above.)

The language accepted or recognized by a finite deterministic automaton $ \mathop{\rm FDA}\nolimits $ is defined by

$$ L ( \mathop{\rm FDA}\nolimits ) \ = \ \{ {w \in V _{I} ^ *} : { s _{0} w \Rightarrow^{*} s _{1} \ \textrm{ for } \ \textrm{ some } \ s _{1} \in S _ 1} \} . $$

A finite non-deterministic automaton $ \mathop{\rm FNA}\nolimits $ is defined as a finite deterministic automaton with the following two exceptions. In b) $ s _{0} $ is replaced by a subset $ S _{0} \subseteq V _{s} $. In c) the second half of the sentence (beginning with "and" ) is omitted. The language accepted by an $ \mathop{\rm FNA}\nolimits $ is defined by

$$ L ( \mathop{\rm FNA}\nolimits ) \ = \ \{ {w \in V _{I} ^ *} : { s _{0} w \Rightarrow^{*} s _{1} \ \textrm{ for } \ \textrm{ some } \ s _{0} \in S _{0} \ \textrm{ and } \ s _{1} \in S _ 1} \} . $$

A language is of type 3 if and only if it is accepted by some finite deterministic automaton and if and only if it is accepted by some finite non-deterministic automaton.

A rewriting system $ (V,\ P) $ is called a pushdown automaton if and only if each of the following conditions 1)–3) is satisfied.

1) $ V $ is divided into two disjoint alphabets $ V _{s} $ and $ V _{I} \cup V _{z} $. The sets $ V _{s} $, $ V _{I} $ and $ V _{z} $ are called the state, input and pushdown alphabet, respectively. The sets $ V _{I} $ and $ V _{z} $ are non-empty but not necessarily disjoint.

2) Elements $ s _{0} \in V _{s} $, $ z _{0} \in V _{z} $ and a subset $ S _{1} \subseteq V _{s} $ are specified, namely, the so-called initial state, start letter and final state set.

3) The productions in $ P $ are of the two forms

$$ \tag{a1} zs _{i} \ \rightarrow \ ws _{j} ,\ \ z \in V _{z} ,\ \ w \in V _ z^{*} ,\ \ s _{i} ,\ s _{j} \in V _{s} ; $$

$$ \tag{a2} zs _{i} a \ \rightarrow \ ws _{j} ,\ \ z \in V _{z} ,\ \ w \in V _ z^{*} ,\ \ a \in V _{i} ,\ \ s _{i} ,\ s _{j} \in V _{s} . $$

The language accepted by a pushdown automaton $ \mathop{\rm PDA}\nolimits $ is defined by

$$ L ( \mathop{\rm PDA}\nolimits ) \ = \ \{ {w \in V _{I} ^ *} : { z _{0} s _{0} w \Rightarrow^{*} us _{1} \ \textrm{ for } \ \textrm{ some } \ u \in V _ z^{*} ,\ s _{1} \in S _ 1} \} . $$

A pushdown automaton is deterministic if and only if, for every pair $ (s _{i} ,\ z) $, $ P $ contains either exactly one production (a1) and no productions (a2), or no productions (a1) and exactly one production (a2) for every $ a \in V _{I} $.

The family of context-free languages equals the family of languages accepted by pushdown automata. Languages accepted by deterministic pushdown automata are referred to as deterministic (context-free) languages. The role of determinism is different in connection with pushdown and finite automata: the family of deterministic languages is a proper subfamily of the family of context-free languages.

The automata considered above have no other output facilities than being or not being in a final state, i.e., they are only capable of accepting or rejecting inputs. Occasionally devices (transducers) capable of having words as outputs, i.e., capable of translating words into words, are considered. A formal definition is given below only for the transducer corresponding to a finite automaton. A pushdown transducer is defined analogously.

A rewriting system $ (V,\ P) $ is called a sequential transducer if and only if each of the following conditions 1)–3) is satisfied.

1) $ V $ is divided into two disjoint alphabets $ V _{s} $ and $ V _{1} \cup V _{0} $. The sets $ V _{s} $, $ V _{1} $, $ V _{0} $ are called the state, input and output alphabet, respectively. (The two latter alphabets are non-empty but not necessarily disjoint.)

2) An element $ s _{0} \in V _{s} $ and a subset $ S _{1} \subseteq V _{s} $ are specified (initial state and final state set).

3) The productions in $ P $ are of the form

$$ s _{i} w \ \rightarrow \ us _{j} ,\ \ s _{i} ,\ s _{j} \in V _{s} ,\ \ w \in V _{s} ,\ \ w \in V _ 1^{*} ,\ \ u \in V _ 0^{*} . $$

If, in addition, $ w \neq \lambda $ in all productions, then the rewriting system is called a generalized sequential machine (gsm).

For a sequential transducer $ \mathop{\rm ST}\nolimits $, words $ w _{1} \in V _ 1^{*} $ and $ w _{2} \in V _ 0^{*} $, and languages $ L _{1} \in V _ 1^{*} $ and $ L _{2} \subseteq V _ 0^{*} $, one defines

$$ \mathop{\rm ST}\nolimits (w _{1} ) \ = \ \{ {w} : { s _{0} s _{1} \Rightarrow^{*} ws _{1} \ \textrm{ for } \ \textrm{ some } \ s _{1} \in S _ 1} \} , $$

$$ \mathop{\rm ST}\nolimits (L _{1} ) \ = \ \{ u: \ u \in \mathop{\rm ST}\nolimits (w) \ \textrm{ for } \ \textrm{ some } \ w \in L _{1} \} , $$

$$ \mathop{\rm ST}\nolimits^{-1} (w _{2} ) \ = \ \{ u: \ w _{2} \in \mathop{\rm ST}\nolimits (u) \} , $$

$$ \mathop{\rm ST}\nolimits^{-1} (L _{2} ) \ = \ \{ u: \ u \in \mathop{\rm ST}\nolimits^{-1} (w) \ \textrm{ for } \ \textrm{ some } \ w \in L _{2} \} . $$

Mappings of languages thus defined are referred to as (rational) transductions and inverse (rational) transductions. If $ \mathop{\rm ST}\nolimits $ is also a gsm, one speaks of gsm mappings and inverse gsm mappings.

Homomorphisms, inverse homomorphisms and the mappings $ \tau (L) = L \cap R $, where $ R $ is a fixed regular language, are all rational transductions, the first and the last being also gsm mappings. The composite of two rational transductions (respectively, gsm mappings) is again a rational transduction (respectively, a gsm mapping). Every rational transduction $ \tau $ can be expressed in the form

$$ \tau (L) \ = \ h _{1} (h _ 2^{-1} (L) \cap R), $$

where $ h _{1} $ and $ h _{2} $ are homomorphisms and $ R $ is a regular language. These results show that a language family is closed under rational transductions if and only if it is closed under homomorphisms, inverse homomorphisms and intersections with regular languages.

A finite probabilistic automaton (or stochastic automaton, cf. Automaton, probabilistic) is an ordered quintuple

$$ \mathop{\rm PA}\nolimits \ = \ (V _{1} ,\ V _{s} ,\ s _{0} ,\ S _{1} ,\ H), $$

where $ V _{1} $ and $ V _{s} = \{ s _{0} \dots s _{ {n - 1}} \} $ are disjoint alphabets (inputs and states), $ s _{0} \in V _{s} $ and $ S _{1} \subseteq V _{s} $( initial state and final state set), and $ H $ is a mapping of $ V _{1} $ into the set of $ n $- dimensional stochastic matrices. (A stochastic matrix is a square matrix with non-negative real entries and with row sums equal to 1.) The mapping $ H $ is extended to a homomorphism of $ V _ 1^{*} $ into the monoid of $ n $- dimensional stochastic matrices. Consider $ V _{s} $ to be an ordered set as indicated, let $ \pi $ be the $ n $- dimensional stochastic row vector whose first component equals 1, and let $ \eta $ be the $ n $- dimensional column vector consisting of 0's and 1's such that the $ i $- th component of $ \eta $ equals 1 if and only if the $ i $- th element of $ V _{s} $ belongs to $ S _{1} $. The language accepted by a $ \textrm{ PA } $ with cut-point $ \alpha $, where $ \alpha $ is a real number satisfying $ 0 \leq \alpha < 1 $, is defined by

$$ L ( \mathop{\rm PA}\nolimits ,\ \alpha ) \ = \ \{ {w \in V _{1} ^ *} : { \pi H (w) \eta > \alpha} \} . $$

Languages obtained in this fashion are referred to as stochastic languages.

Decision problems (cf. Decision problem) play an important role in language theory. The usual method of proving that a problem is undecidable is to reduce it to some problem whose undecidability is known. The most useful tool for problems in language theory is in this respect the Post correspondence problem. By definition, a Post correspondence problem is an ordered quadruple $ \mathop{\rm PCP}\nolimits = \{ \Sigma ,\ n,\ \alpha ,\ \beta \} $, where $ \Sigma $ is an alphabet, $ n \geq 1 $, and $ \alpha = ( \alpha _{1} \dots \alpha _{n} ) $, $ \beta = ( \beta _{1} \dots \beta _{n} ) $ are ordered $ n $- tuples of elements of $ \Sigma^{+} $. A solution to the $ \mathop{\rm PCP}\nolimits $ is a non-empty finite sequence of indices $ i _{1} \dots i _{k} $ such that

$$ \alpha _{ {i _ 1}} \dots \alpha _{ {i _ k}} \ = \ \beta _{ {i _ 1}} \dots \beta _{ {i _ k}} . $$

It is undecidable whether an arbitrary given $ \mathop{\rm PCP}\nolimits $( or an arbitrary given $ \mathop{\rm PCP}\nolimits $ over the alphabet $ \Sigma = \{ a _{1} ,\ a _{2} \} $) has a solution.

Also Hilbert's tenth problem is undecidable: Given a polynomial $ P (x _{1} \dots x _{k} ) $ with integer coefficients, one has to decide whether or not there are non-negative integers $ x _{i} $, $ i = 1 \dots k $, satisfying the equation

$$ P (x _{1} \dots x _{k} ) \ = \ 0. $$

The membership problem is decidable for context-sensitive languages but undecidable for type-0 languages. (More specifically, given a context-sensitive grammar $ G $ and a word $ w $, it is decidable whether or not $ w \in L (G) $.) It is decidable whether two given regular languages are equal and also whether one of them is contained in the other. Both of these problems (the equivalence problem and the inclusion problem) are undecidable for context-free languages. It is also undecidable whether a given regular and a given context-free language is empty or infinite, whereas both of these problems are undecidable for context-sensitive languages. It is undecidable whether the intersection of two context-free languages is empty. The intersection of a context-free and a regular language is always context-free; and, hence, its emptiness is decidable. It is undecidable whether a given context-free language is regular.

The following example is due to M. Soittola. Consider the grammar $ G $ determined by the productions

$$ S \ \rightarrow \ abc,\ \ S \ \rightarrow \ aAbc, $$

$$ Ab \ \rightarrow \ bA,\ \ Ac \ \rightarrow \ Bbcc, $$

$$ bB \ \rightarrow \ Bb,\ \ aB \ \rightarrow \ aaA,\ \ aB \ \rightarrow \ aa. $$

Then

$$ L (G) \ = \ \{ {a^{n} b^{n} c ^ n} : { n \geq 1} \} . $$

In fact, any derivation according to $ G $ begins with an application of the first or second production, the first production directly yielding the terminal word $ abc $. Consider any derivation $ D $ from a word $ a^{i} A b^{i} c^{i} $, where $ i \geq 1 $, leading to a word over the terminal alphabet. $ D $ must begin with $ i $ applications of the third production ( $ A $ travels to the right) and then continue with an application of the fourth production (one further occurrence of $ b $ and $ c $ is deposited). Now the word $ a^{i} b^{i} Bbc ^ {i + 1} $ has been derived. For this the only possibility is to apply the fifth production $ i $ times ( $ B $ travels to the left), after which one obtains the word $ a^{i} Bb ^ {i + 1} c ^ {i + 1} $. This word directly yields one of the two words

$$ a ^ {i + 1} Ab ^ {i + 1} c ^ {i + 1} \ \ \textrm{ or } \ \ a ^ {i + 1} b ^ {i + 1} c ^ {i + 1} $$

by the last two productions (one further $ a $ is deposited, and either a new cycle is entered or the derivation is terminated). This argument shows that $ G $ generates all words belonging to the language and nothing else; every step in a derivation is uniquely determined, the only exception being that there is a choice between termination and entrance into a new cycle.

References[edit]

[a1]	N. Chomsky, "Three models for the description of language" IRE Trans. Information Theory , IT-2 (1956) pp. 113–124
[a2]	N. Chomsky, "Syntactic structures" , Mouton (1957)
[a3]	M. Davis, "Computability and unsolvability" , McGraw-Hill (1958)
[a4]	S. Eilenberg, "Automata, languages and machines" , A , Acad. Press (1974)
[a5]	D. Hilbert, "Mathematische Probleme. Vortrag, gehalten auf dem internationalen Mathematiker-Kongress zu Paris, 1900" , Gesammelte Abhandlungen , III , Springer (1935)
[a6]	S.C. Kleene, "Representation of events in nerve nets and finite automata" , Automata studies , 34 , Princeton Univ. Press (1956) pp. 3–42
[a7]	A. Paz, "Introduction to probabilistic automata" , Acad. Press (1971)
[a8]	E.L. Post, "A variant of a recursively unsolvable problem" Bull. Amer. Math. Soc. , 52 (1946) pp. 264–268
[a9]	H. Rogers jr., "Theory of recursive functions and effective computability" , McGraw-Hill (1967) pp. 164–165
[a10]	G. Rozenberg, "Selective substitution grammars" Elektronische Informationsverarbeitung und Kybernetik (EIK) , 13 (1977) pp. 455–463
[a11]	A. Salomaa, "Theory of automata" , Pergamon (1969)
[a12]	A. Salomaa, "Formal languages" , Acad. Press (1973)
[a13]	A. Salomaa, "Jewels of formal language theory" , Computer Science Press (1981)
[a14]	A. Salomaa, M. Soittola, "Automata-theoretic aspects of formal power series" , Springer (1978)
[a15]	A. Thue, "Ueber unendliche Zeichenreihen" Skrifter utgit av Videnskapsselskapet i Kristiania , I (1906) pp. 1–22
[a16]	A. Thue, "Probleme über Veränderungen von Zeichenreihen nach gegebenen Regeln" Skrifter utgit av Videnskapsselskapet i Kristiania , I.10 (1914)
[a17]	A.M. Turing, "On computable numbers, with an application to the Entscheidungsproblem" Proc. London Math. Soc. , 42 (1936) pp. 230–265