Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!uunet!mcsun!ukc!dcl-cs!aber-cs!odin!pcg
From: pcg@odin.cs.aber.ac.uk (Piercarlo Grandi)
Newsgroups: comp.lang.c++
Subject: Re: zortech problem with lex
Message-ID: <PCG.90Mar23120003@odin.cs.aber.ac.uk>
Date: 23 Mar 90 12:00:03 GMT
References: <6300008@ux1.cso.uiuc.edu> <24800002@sunb6>
	<25ECE752.29589@paris.ics.uci.edu> <10541@alice.UUCP>
	<6553@cadillac.CAD.MCC.COM> <25F8C92A.9349@paris.ics.uci.edu>
	<3742@tukki.jyu.fi> <26073795.10139@paris.ics.uci.edu>
Sender: pcg@aber-cs.UUCP
Organization: Coleg Prifysgol Cymru
Lines: 66
In-reply-to: rfg@ics.uci.edu's message of 21 Mar 90 08:13:09 GMT


In article <26073795.10139@paris.ics.uci.edu> rfg@ics.uci.edu (Ronald
Guilmette) writes:

   If you are willing to do just a few more things with semantic analysis
   rather than syntax analysis (relative to what current C++ implementors
   seem to be doing) then you can in fact have a useful and practical LALR(1)
   grammar for C++.  That was my point.  I just stated it badly.

Sorry Ron, but we really still disagree on a question of terminology
here; a LALR(1) grammar belongs to the context free class. If you need
semantic information to parse C++ (and by all means you need it) then
your intelligent lexer is also a translator from a context sensitive
C++ to another language that is indeed LALR(1), BUT IS NO LONGER C++.

In particular this language has typedef names as tokens/keywords for the
scopes to which they apply, as in a sense when you define a class name
in C++ you are creating a new scoped reserved word, the class name, and
certain phrases can only be parsed if you know that the given string
is such a token.

Your point is that you can 'have a useful and practical LALR(1) PARSER
for A C++ COMPILER'; such a *parser* however does not parse C++ itself,
but a C++ derivative, generated by the intelligent lexer. C++ itself
does not admit a context free *grammar*, much less a LALR(1) one.

I could say that C++ is a *family* of LALR(1) languages, each of which
differs from the others by a different set of typedef names in different
scopes, and an intelligent lexer filters out the particular language
belonging to this family and passes it to the LALR(1) parser.

Typedef names do not turn Classic C itself into a context sensitive
language by the slimmest of margins. Incidentally, you can easily see
that the requirement for a leading 'struct' or 'enum' or 'union' before
a tag in a declaration was designed to help the parser be LL(1). Too bad
that 'typedef' was later introduced as a *storage class* and not as new
type of tag, e.g. not something like 'mode <name> { <abstract
declarator> } ....', which could be likened to a one field struct.

In C the problem does not occur because you can distinguish declarations
that include a typedef name from a statement in a context free way
because you need a '=' in an initialization and you cannot intermix
declarations with statements.

The '=' for initialization is particularly crucial; it used to be that
Classic C did not require it (as in 'int i 3;'), but it was introduced
later on because, as Ritchie put it, certain things looked so much like
procedure declarations that the parser got confused (as in 'int j 1; int
k (j)').

Stroustrup has reintroduced the old bugaboo. Those that do not know
history... :-). Note that this aspect of C++, that implies dynamic
augmentation of the set of keywords for a given scope, is strongly
reminiscent of Algol 68, and its (in)famous ambiguity between mode and
operator denotations, and I think that Stroustrup was more familiar with
the fine details of Algol 68 than of C at some point in his life.

In particular, while C++ surely does not admit a context free LALR(1)
grammar, it does admit description with a two level grammar (which is
context sensitive), Algol 68 style, and the second level grammar can
well be LALR(1) (those that do know two level grammars please excuse the
poetic license here :->).
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk