An operator precedence grammar is a kind of grammar for formal languages. Technically, an operator precedence grammar is a context-free grammar that has the property (among others)[1] that no production has either an empty right-hand side or two adjacent nonterminals in its right-hand side. These properties allow precedence relations to be defined between the terminals of the grammar. A parser that exploits these relations is considerably simpler than more general-purpose parsers, such as LALR parsers. Operator-precedence parsers can be constructed for a large class of context-free grammars.
Operator precedence grammars rely on the following three precedence relations between the terminals:[2]
Relation | Meaning |
---|---|
[math]\displaystyle{ a \lessdot b }[/math] | a yields precedence to b |
[math]\displaystyle{ a \doteq b }[/math] | a has the same precedence as b |
[math]\displaystyle{ a \gtrdot b }[/math] | a takes precedence over b |
These operator precedence relations allow to delimit the handles in the right sentential forms: [math]\displaystyle{ \lessdot }[/math] marks the left end, [math]\displaystyle{ \doteq }[/math] appears in the interior of the handle, and [math]\displaystyle{ \gtrdot }[/math] marks the right end. Contrary to other shift-reduce parsers, all nonterminals are considered equal for the purpose of identifying handles.[3] The relations do not have the same properties as their un-dotted counterparts; e. g. [math]\displaystyle{ a \doteq b }[/math] does not generally imply [math]\displaystyle{ b \doteq a }[/math], and [math]\displaystyle{ b \gtrdot a }[/math] does not follow from [math]\displaystyle{ a \lessdot b }[/math]. Furthermore, [math]\displaystyle{ a \doteq a }[/math] does not generally hold, and [math]\displaystyle{ a \gtrdot a }[/math] is possible.
Let us assume that between the terminals ai and ai+1 there is always exactly one precedence relation. Suppose that $ is the end of the string. Then for all terminals b we define: [math]\displaystyle{ \$ \lessdot b }[/math] and [math]\displaystyle{ b \gtrdot \$ }[/math]. If we remove all nonterminals and place the correct precedence relation: [math]\displaystyle{ \lessdot }[/math], [math]\displaystyle{ \doteq }[/math], [math]\displaystyle{ \gtrdot }[/math] between the remaining terminals, there remain strings that can be analyzed by an easily developed bottom-up parser.
For example, the following operator precedence relations can be introduced for simple expressions:[4]
They follow from the following facts:[5]
The input string[4]
after adding end markers and inserting precedence relations becomes
Having precedence relations allows to identify handles as follows:[4]
It is generally not necessary to scan the entire sentential form to find the handle.
The algorithm below is from Aho et al.:[6]
If $ is on the top of the stack and ip points to $ then return else Let a be the top terminal on the stack, and b the symbol pointed to by ip if a [math]\displaystyle{ \lessdot }[/math] b or a [math]\displaystyle{ \doteq }[/math] b then push b onto the stack advance ip to the next input symbol else if a [math]\displaystyle{ \gtrdot }[/math] b then repeat pop the stack until the top stack terminal is related by [math]\displaystyle{ \lessdot }[/math] to the terminal most recently popped else error() end
An operator precedence parser usually does not store the precedence table with the relations, which can get rather large. Instead, precedence functions f and g are defined.[7] They map terminal symbols to integers, and so the precedence relations between the symbols are implemented by numerical comparison: [math]\displaystyle{ f(a) \lt g(b) }[/math] must hold if [math]\displaystyle{ a \lessdot b }[/math] holds, etc.
Not every table of precedence relations has precedence functions, but in practice for most grammars such functions can be designed.[8]
The below algorithm is from Aho et al.:[9]
Consider the following table (repeated from above):[10]
Using the algorithm leads to the following graph:
gid \ fid f* \ / g* / f+ | \ | g+ | | g$ f$
from which we extract the following precedence functions from the maximum heights in the directed acyclic graph:
id | + | * | $ | |
---|---|---|---|---|
f | 4 | 2 | 4 | 0 |
g | 5 | 1 | 3 | 0 |
The class of languages described by operator-precedence grammars, i.e., operator-precedence languages, is strictly contained in the class of deterministic context-free languages, and strictly contains visibly pushdown languages.[11]
Operator-precedence languages enjoy many closure properties: union, intersection, complementation,[12] concatenation,[11] and they are the largest known class closed under all these operations and for which the emptiness problem is decidable. Another peculiar feature of operator-precedence languages is their local parsability,[13] that enables efficient parallel parsing.
There are also characterizations based on an equivalent form of automata and monadic second-order logic.[14]
Original source: https://en.wikipedia.org/wiki/Operator-precedence grammar.
Read more |