Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: Notesfiles $Revision: 1.6.2.16 $; site ada-uts.UUCP Path: utzoo!watmath!clyde!burl!ulysses!bellcore!decvax!ada-uts!richw From: richw@ada-uts.UUCP Newsgroups: net.lang.c Subject: C not LALR(1) & compiler bugs Message-ID: <10200035@ada-uts.UUCP> Date: Fri, 17-Jan-86 16:20:00 EST Article-I.D.: ada-uts.10200035 Posted: Fri Jan 17 16:20:00 1986 Date-Received: Thu, 23-Jan-86 21:06:02 EST Lines: 77 Nf-ID: #N:ada-uts:10200035:000:3111 Nf-From: ada-uts!richw Jan 17 16:20:00 1986 C's grammar is CONTEXT SENSITIVE !? Can it be ?! The following is quoted from page 121 of "C: A Reference Manual" by Harbison & Steele (which, by the way, beats the pants off of Kernighan & Ritchie as a reference manual). After the quote, I've included a small program which just may reveal a minor bug in your C compiler (it did for mine). Allowing ordinary identifiers, as opposed to reserved words only, as type specifiers makes the C grammar context sensitive, and hence not LALR(1). To see this, consider this program line A ( *B ); If A has been defined as a typedef name, then the line is a declaration of a variable B to be of type "pointer to A." (The parentheses surrounding "*B" are ignored.) If A is not a type name, then this line is a call of the function A with the single parameter *B. This ambiguity cannot be resolved grammatically. C compilers based on UNIX' YACC parser-generator -- such as the Portable C Compiler -- handle this problem by feeding information acquired during semantic analysis back to the lexer. In fact, most C compilers do some typedef analysis during lexical analysis. All I have to say, concerning the design of C's syntax, is "Oops". I also realized that this, combined with that real spiffy feature of C that identifiers are the same if the first 8 characters are the same, could be combined to really confuse C compilers. I tried the following program on the compiler I use: typedef int long_type_name; f(a) int *a; { long_type_of_function_name (*a); printf("Bye"); } According to H&S, a correct C compiler should say that this is a redeclaration of "a" (since "long_type_of_function_name" and "long_type_name" are, uh, the same identifer). However, the compiler I use simply eats it up, thinking that the line in question is a call to some external function (which, since it wasn't explicitly declared, C gratiously assumes returns an int -- isn't C just so helpful !). My guess is that when the lexer checks to see if the function name is really a typedef'd name, it checks ALL of the characters in both names (i.e. strcmp) instead of checking just the first 8 (i.e. strncmp). Of course, since the identifiers really ARE different, it SEEMS as if the compiler's thinking it's a function call IS correct. Technically, it's a buggy compiler, though. Isn't it strange that it seems better for the compiler to be wrong? Doesn't that make you wonder if something is SERIOUSLY wrong with C? Personally, I think that the real fault for my "buggy" compiler lies not with the compiler writer, but in the shoddy language design that haunts the deep-dark corners of C. I mean, is there any excuse for the grammar being context sensitive? Or, for that matter, for identifiers having only 8 significant characters? -- Rich "Picky-Picky-Picky" Wagner P.S. Forgive me if this piece of C trivia has been already discussed (or flamed, as in this case) in net.lang.c -- I just found out about it and was amazed.