Newsgroups: comp.lang.misc Path: utzoo!utgpu!jarvis.csri.toronto.edu!csri.toronto.edu!norvell From: norvell@csri.toronto.edu (Theodore Stevens Norvell) Subject: Re: Language Design Message-ID: <8903250115.AA08564@yorkville.csri.toronto.edu> Organization: University of Toronto, CSRI References: <5200040@m.cs.uiuc.edu> <12443@watdragon.waterloo.edu> <9122@claris.com> <12606@watdragon.waterloo.edu> Date: Fri, 24 Mar 89 20:15:50 EST In article <12606@watdragon.waterloo.edu> gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) writes: >A context-free grammar G is a 4-tuple G = where N is a finite >set of nonterminal symbols, T (the alphabet) is a finite set of terminal >symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ... >where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol. > >The K&R grammar does not have a finite alphabet. I'm sure that Gordon Cormack has something in mind here because I know he's a smart guy, but I'm darned if I can figure out what it is. In any case, I'm sure that the K&R grammar only uses a finite subset of its alphabet, because otherwise it wouldn't fit in my desk. So Gordon, if your still there after being insulted in such an uncalled for way, please enlighten us. As I recall, the grammar in K&R [1978] has a single terminal "identifier" which includes all identifiers, a nonterminal called "typedef_name" defined by the production typedef_name --> identifier This leads to an ambiguity because phrases like t * i ; can be understood to be either declarations or statements. But ambiguous grammars have been with us a long time. People even use parser generators that accept ambiguous grammars! In fact there are other ambiguities in the grammar (e.g. the dangling else). Ambiguities don't change the set of strings admitted by a grammar they merely admit some strings in more than one way. The real question is what do you want to call syntax and what do you want to call static semantics. In the absence of any definitive definition, this is a matter of taste and convenience to the implementor. Is the following (complete) compilation unit syntactically correct? f() { t * i ; int a; } The grammar says yes; most compliers say "syntax error". If we change the grammar to make typedef_name a terminal and remove its only production, then the grammar agrees with the compilers. The grammar is of course still context free, but the ambiguity is gone. Ric Holt raises the separate question of whether the C language is context free. The syntax of C certainly is. The problem with C is that it is not "feedback free" in the following sense. The parsing of C depends on the lexing of C (this is only natural), but the lexing depends on the parsing. Of course, with the original K&R grammar, C _is_ feedback free which is no doubt why Ritchie wrote the grammar that way. Summary: -The K&R [1978] grammar is context free, but describes the wrong language. -The C language syntax (abstracted from lexing and static semantics) is also context free. -The usual implementations and most useful definitions of C language are not "feedback free". -We're still in the dark about infinite alphabets. Theo Norvell (The term "feedback free" is used in a completely different sense by Holt. My apologies for stealing it. By the time I realized I was using it differently, I liked it too much to give it back. In Holt's sense, C is feedback free. See the Turing Programming Language Design and Definition for his definition. In fact, see it anyway, if you're interested in design or definition.)