Path: utzoo!utgpu!watserv1!watmath!att!pacbell!pacbell.com!ucsd!usc!snorkelwacker!bloom-beacon!eru!hagbard!sunic!mcsun!tuvie!vmars!alex From: alex@vmars.tuwien.ac.at (Alexander Vrchoticky) Newsgroups: comp.lang.c Subject: Re: Summary: 'C', is it's grammar context sensitive ? Message-ID: <1787@tuvie> Date: 31 Aug 90 14:29:58 GMT References: <1990Aug30.223440.7377@NCoast.ORG> Sender: news@tuvie Lines: 49 ramsey@NCoast.ORG (Cedric Ramsey) writes: >If you guys agree that 'C' is context sensitive then what languages >truely are context-free, if any. Uh, almost none, or almost every. It depends on your view. I won't go into the details of formal language theory here, but clarify a few points: - Efficient syntax analysis of programming languages dictates the use of context free grammars for syntax specification. - From a formal language point of view no practical programming language is context-free. (It can't be context free if it allows for an unlimited number of identifiers that can appear three or more times in a program.) The apparent contradiction is usually resolved as follows: - Build a parser based on a context free grammar that generates a `reasonably small' superset of the language and use other techniques (symbol tables) to `filter out' the cases that are illegal in the language but which slip through the parser based on the superset. The phrase `reasonably small' is not really very well-defined. However I think few people would call the following a reasonable grammar for C, even though it generates a superset of C: S --> A S | \epsilon A --> 'a' | 'b' | ... | 'z' (and the uppercase letters, and the digits, and ...) On the other hand most people are very happy with mapping all possible identifiers into one token with the use of a lexical analyzer and using table lookup to check for things like proper declarations. This is done routinely in every compiler. Fewer people are happy with the fact that two different tokens in the C language (identifier and typedef-name) have the same lexical representation and can therefore not be told apart without resorting to table lookup. But that's the way C is defined. And to answer the original question: Most (all?) newer languages have avoided the problem by a somewhat more generous use of keywords. -- Alexander Vrchoticky Technical University Vienna, Dept. for Real-Time Systems Voice: +43/222/58801-8168 Fax: +43/222/569149 e-mail: alex@vmars.tuwien.ac.at or vmars!alex@relay.eu.net (Don't use 'r'!)