Path: utzoo!utgpu!utstat!jarvis.csri.toronto.edu!mailrus!eecae!tank!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.lang.misc Subject: Re: Language Design, or: is C's Grammar Context Free Keywords: BNF, Grammars, Silliness Message-ID: <16557@mimsy.UUCP> Date: 27 Mar 89 16:03:18 GMT References: <5200040@m.cs.uiuc.edu> <12443@watdragon.waterloo.edu> <9122@claris.com> <12606@watdragon.waterloo.edu> <9615@claris.com> <9861@megaron.arizona.edu> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 103 Maybe this will work best a bit at a time. First, our definition: In article <12606@watdragon.waterloo.edu> gvcormack@watdragon.waterloo.edu (Gordon V. Cormack) writes: >A context-free grammar G is a 4-tuple G = where N is a finite >set of nonterminal symbols, T (the alphabet) is a finite set of terminal >symbols, P is a finite set of production rules of the form A -> X1 X2 X3 ... >where A is in N, and X1, X2, X3 are in Union(N,T). S in N is the Start symbol. In article <9165@claris.com> hearn@claris.com (Bob Hearn) writes: >... I think this whole argument is founded on misunderstandings. It is. >It all depends on what you mean by "context-free grammar." Most people mean the definition above. >If you follow the technical definition, then there is no >question that C's grammar (as specified in K&R) is context-free. Aha! But the grammar in K&R 1st ed. is incorrect. In article <9861@megaron.arizona.edu> gudeman@arizona.edu (David Gudeman) writes: >One inaccuracy is in the production > typedef-name : identifier >I don't know if there are others. (I recall a correction or two elsewhere, but my K&R is in my office and I am not.) >So both claims made in this argument are correct: >(1) the grammar of C is not context free >and >(2) the grammar given in K&R is context free Returning to <9165@claris.com> (Bob Hearn): >... As for Cormack's claim that C does not have a finite set of terminals, (this refers to the line `The K&R grammar does not have a finite alphabet.' in <12606@watdragon.waterloo.edu>) >I can't say that I really understand what he means. Perhaps he is >referring to the fact that there is an infinite number of identifiers? Rather, that there are an unbounded (not infinite, although it comes out the same) number of type declarators, via the `typedef' mechanism. >Well, if we restrict ourselves to K&R, then the terminal in question is >simply (identifier), and it does not decompose further. (He then suggests splitting it into letters, etc., a simple mechanical task.) Unfortunately, the terminal in question is the `identifier' in `typedef-name : identifier'. This is, as has been noted, inaccurate. The key is that only those identifiers which (a) have been previously defined via `typedef' and (b) have not been hidden by new local declarations are in fact typedef names. This information can only be found by context (oops, there is that word again), and if the size of the input string (C code) is unbounded, there is no way to capture this in a CFG (using our definition above). If we limit the length of the language to be recognised---to any fixed number of symbols, as long as it is fixed---we can construct a CFG for this subset of C. Fortunately, none of this really matters. As someone else pointed out (in an article I forgot to save), we generally do not write grammars that accept only strictly correct inputs. Instead, we take some shortcuts and patch things up via semantics. This approach works quite well for C, although a number of compilers do it wrong (including PCC) and patch the grammar by having the lexer return `type' instead of `identifier' when the identifier has been defined via typedef. In particular, this prevents typedef int temperature; f() { auto double temperature; ... } from compiling correctly. (PCC does handle f(temperature) double temp; { ... } correctly, by turning off typedef name recognition in the lexer during argument parsing. It could do the same after type keywords in local variable declarations, and I am not sure why it does not---it may have done a lookahead already, perhaps. But that is not the only place PCC has been seen to fall short....) >someone, a long time ago, made some comment on how screwed up C's grammar >was, and that it wasn't context-free. Well, as I don't think C's grammar >is particularly screwed up relative to other languages' grammars, I >responded. ... Strictly speaking (as long as you stick with `typedef produces a new type specifier'), C's grammar is not context free; but it is indeed not seriously `screwed up'. Perfect generators are not necessary, and a language `close enough to C' exists that can be parsed simply. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris