Answered Jun 03, 2022 · 6 votes
In computer science, formal grammars are described using production rules. For example, a rule of the form:
says the symbol P may be replaced by the symbols xyz, and a rule of the form
says the symbol Q may be replaced by the symbol a and the symbol P or by the symbol b and the symbol P.
The formal grammar designates a start symbol, such as Q, and says which symbols are nonterminal symbols (placeholders that do not appear in the final string) and which are terminal symbols (that may appear in the final string). If Q is the start symbol for the grammar with the two rules above, then these rules describe a grammar that contains two strings, axyz and bxyz, because those are the only two strings that can be made with those rules. The set of strings a grammar can produce is called its language.
λ is commonly used the mean an empty string, one with no symbols.
This grammar with nonterminal symbol S, terminal symbols a and b, and start symbol S:
can produce the empty string by replacing S with the empty string, or it can produce abb by replacing S with aSbb and then the new S with the empty string, or it can produce aabbbb by replacing S with aSbb and then replacing the new S with aSbb, yielding aaSbbbb, and then replacing S with the empty string, yielding aabbbb. We can see the language of this grammar is the set of all strings containing some number of a characters (possibly zero) followed by twice as many b characters.
The C standard defines the C language using a formal grammar (with some qualifications and modifications in English). For example, it contains this production (where it uses “:” instead of the “→” I used above and uses separate lines instead of “|” to indicate a choice):
statement: labeled-statement compound-statement expression-statement selection-statement iteration-statement jump-statement
Regarding the other terminology you marked in bold:
undefined terminal symbols integer-constant,…
The C grammar given in The C Programming Language appears to be a partial grammar, leaving integer-constant as an undefined symbol, and hence acting as a terminal symbol. In the official ANSI C 1990 standard, it is a defined non-terminal in the official ANSI C 1990 standard.
… the typewriter style words and symbols are terminals given literally.
This is clearer in the original, where the fonts used make it evident the “typewriter” style is a fixed-width font:
… the typewriter style words and symbols are terminals given literally.
This, with its preceding text, means that, in the grammar, words in italic, like declaration, are non-terminal symbols of the grammar that “become something else” by productions of the grammar, while words in the “typewriter” font, like typedef, are terminal symbols and they are symbols for the literal characters in their names; typedef in the grammar means the characters t, y, p, e, d, e, and f in the source code.
…to duplicate each production with an opt symbol, once with the symbol and once without…
The standard uses a subscript “opt”, like this:
to mean that R is optional. Formally, this is actually two rules:
meaning there is a rule that replaces P with Q R and a rule that replaces P with Q, and either may be used. Hence, the R is optional. Using the “opt” subscript is just a different way of writing the rules.
… this grammar is acceptable to the YACC parser-generator. It has only one conflict, generated by the if-else ambiguity.
In computer science theory, a parser is software that checks whether a string is in a language. (In practice, it often does other useful things at the same time, like assigning “meaning” to the string by interpreting parts of it as numbers, parts as variables, and so on.) YACC is software that reads a specification of a grammar and produces source code for a parser for the language the grammar describes.
The type of parser that YACC generates requires deciding which production will be applied as each token is read. A conflict occurs when the grammar is inadequate determine which production to apply. In the C formal grammar, if (x) if (y) S1 else S2 can be produced in two ways, one that associates else S2 with the first if and one that associates else S2 with the second if. This has to be resolved by adding information to the grammar; YACC is told to associate the else S2 with the most recent if that does not currently have an else.