Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!apple!julius.cs.uiuc.edu!zaphod.mps.ohio-state.edu!ub!uhura.cc.rochester.edu!rochester!pt.cs.cmu.edu!LAMBDA.ERGO.CS.CMU.EDU!dtarditi From: dtarditi@CS.CMU.EDU (David Tarditi) Newsgroups: comp.lang.functional Subject: Re: "Off-side rule" Message-ID: Date: 22 Jan 91 23:57:18 GMT References: <22307@rouge.usl.edu> <1991Jan10.111559.12440@odin.diku.dk> <293@smds.UUCP> <7415@vanuata.cs.glasgow.ac.uk> Sender: dtarditi@LAMBDA.ERGO.CS.CMU.EDU Organization: School of Computer Science, Carnegie Mellon University Lines: 68 In-reply-to: kh@cs.glasgow.ac.uk's message of 14 Jan 91 19:48:38 GMT I am astonished to find people defending the off-side rule. Parsing is a solved problem. Why do designers of semantically "clean" languages insist on botching syntax ? We could argue forever about whether making white-space significant is in good taste. Suffice it to say that most real programming languages in widespread use do not make white-space significant. Maybe they just have not seen the light of "conciseness and legibility", or perhaps from previous experience they wanted to get rid of endless, annoying bugs which are difficult to find. It is bad enough having to match up delimiters in long programs without having to worry about indentation. A significant argument against the off-side rule is that it makes languages difficult to parse. The modern approach to parsing is to define the concrete syntax of a language using BNF and regular expressions. The BNF for the language should be defined so that the language can parsed using an LALR(1) parser. This allows the compiler writer to use software tools which generate efficient parsers and lexers, and get on with the more important part of the work (generating correct, efficient code). The Lex and Yacc Unix(tm) tools exemplify the approach. The concrete syntax of languages using the off-side rule cannot be defined using context-free languages or regular expressions. The reason, simply put, is that these languages possess only a limited ability to "count". Consider the following language (where c^k denotes k occurrences of character c): { s | there exists k such that s= c^kxc^kyc^kz} This, I argue, is a simple example of a program using the off-side rule. Let c be a space character. Then we have k spaces, followed by some phrase, followed by k spaces, etc. I claim that this language is not context-free. I don't offer a proof of this here, but an application of the pumping lemma for context-free languages should suffice. Thus, it is impossible to parse this language using modern techniques unless we resort to hacks. Suggested hacks include adding a preprocessor specifically to deal with off-side rule or altering the lexer to count spaces. I am amazed that people ignore all the work that has been done on parsing. The formalization of syntax is considered by most people to be a significant accomplishment: the formalism is useful to programmers using the languages and compiler writers alike. IMHO, people designing concrete syntax for languages should check that the concrete syntax is LALR(1), which ensures that mechanical tools for parser generation can be used if they exist for whatever language you are writing your compiler in. This is quite simple to do: all you need to do is write a grammar which runs through then Unix(tm) Yacc tool without any ambiguities. David Tarditi tarditi@cs.cmu.edu -- David Tarditi School of Computer Science tarditi@cs.cmu.edu Carnegie Mellon University Pittsburgh, PA 15213