Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uunet!mcsun!ukc!dcl-cs!aber-cs!odin!pcg From: pcg@odin.cs.aber.ac.uk (Piercarlo Grandi) Newsgroups: comp.lang.c++ Subject: Re: zortech problem with lex Message-ID: Date: 23 Mar 90 12:00:03 GMT References: <6300008@ux1.cso.uiuc.edu> <24800002@sunb6> <25ECE752.29589@paris.ics.uci.edu> <10541@alice.UUCP> <6553@cadillac.CAD.MCC.COM> <25F8C92A.9349@paris.ics.uci.edu> <3742@tukki.jyu.fi> <26073795.10139@paris.ics.uci.edu> Sender: pcg@aber-cs.UUCP Organization: Coleg Prifysgol Cymru Lines: 66 In-reply-to: rfg@ics.uci.edu's message of 21 Mar 90 08:13:09 GMT In article <26073795.10139@paris.ics.uci.edu> rfg@ics.uci.edu (Ronald Guilmette) writes: If you are willing to do just a few more things with semantic analysis rather than syntax analysis (relative to what current C++ implementors seem to be doing) then you can in fact have a useful and practical LALR(1) grammar for C++. That was my point. I just stated it badly. Sorry Ron, but we really still disagree on a question of terminology here; a LALR(1) grammar belongs to the context free class. If you need semantic information to parse C++ (and by all means you need it) then your intelligent lexer is also a translator from a context sensitive C++ to another language that is indeed LALR(1), BUT IS NO LONGER C++. In particular this language has typedef names as tokens/keywords for the scopes to which they apply, as in a sense when you define a class name in C++ you are creating a new scoped reserved word, the class name, and certain phrases can only be parsed if you know that the given string is such a token. Your point is that you can 'have a useful and practical LALR(1) PARSER for A C++ COMPILER'; such a *parser* however does not parse C++ itself, but a C++ derivative, generated by the intelligent lexer. C++ itself does not admit a context free *grammar*, much less a LALR(1) one. I could say that C++ is a *family* of LALR(1) languages, each of which differs from the others by a different set of typedef names in different scopes, and an intelligent lexer filters out the particular language belonging to this family and passes it to the LALR(1) parser. Typedef names do not turn Classic C itself into a context sensitive language by the slimmest of margins. Incidentally, you can easily see that the requirement for a leading 'struct' or 'enum' or 'union' before a tag in a declaration was designed to help the parser be LL(1). Too bad that 'typedef' was later introduced as a *storage class* and not as new type of tag, e.g. not something like 'mode { } ....', which could be likened to a one field struct. In C the problem does not occur because you can distinguish declarations that include a typedef name from a statement in a context free way because you need a '=' in an initialization and you cannot intermix declarations with statements. The '=' for initialization is particularly crucial; it used to be that Classic C did not require it (as in 'int i 3;'), but it was introduced later on because, as Ritchie put it, certain things looked so much like procedure declarations that the parser got confused (as in 'int j 1; int k (j)'). Stroustrup has reintroduced the old bugaboo. Those that do not know history... :-). Note that this aspect of C++, that implies dynamic augmentation of the set of keywords for a given scope, is strongly reminiscent of Algol 68, and its (in)famous ambiguity between mode and operator denotations, and I think that Stroustrup was more familiar with the fine details of Algol 68 than of C at some point in his life. In particular, while C++ surely does not admit a context free LALR(1) grammar, it does admit description with a two level grammar (which is context sensitive), Algol 68 style, and the second level grammar can well be LALR(1) (those that do know two level grammars please excuse the poetic license here :->). -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk