Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!mcsun!hp4nl!sci.kun.nl!cs.kun.nl!eerke From: eerke@cs.kun.nl (Eerke Boiten) Newsgroups: comp.lang.functional Subject: Re: "Off-side rule" Summary: it makes the language non-CF, but not unparsable! Message-ID: <2676@wn1.sci.kun.nl> Date: 23 Jan 91 10:53:00 GMT References: <22307@rouge.usl.edu> <1991Jan10.111559.12440@odin.diku.dk> <293@smds.UUCP> <7415@vanuata.cs.glasgow.ac.uk> Sender: root@sci.kun.nl Organization: University of Nijmegen, The Netherlands Lines: 56 In article dtarditi@CS.CMU.EDU (David Tarditi) writes: >A significant argument against the off-side rule is that it makes >languages difficult to parse. The modern approach to parsing is to define >the concrete syntax of a language using BNF and regular expressions. >The BNF for the language should be defined so that the language can >parsed using an LALR(1) parser. [..] This is only important for *prototyping* implementations of functional languages in environments where no other parser generators than YACC are available. Anyone who implemented a functional language, care to comment? >The concrete syntax of languages using the off-side rule cannot be defined >using context-free languages or regular expressions. The reason, simply put, >is that these languages possess only a limited ability to "count". >Consider the following language (where c^k denotes k occurrences of >character c): > { s | there exists k such that s= c^kxc^kyc^kz} >This, I argue, is a simple example of a program using the off-side >rule. [c=space] I claim that this language is not context-free. [..] >Thus, it is impossible to parse this language using modern techniques >unless we resort to hacks. Suggested hacks include adding a preprocessor >specifically to deal with off-side rule or altering the lexer to count >spaces. Two comments: 1. The language above *is* contextfree for limited k (let's say 80, for instance - >80 chars per line is bad programming style, certainly in nice succinct functional languages) 2. Now I should remark that most programming languages are not *context* free (identifiers!), maybe? The most obvious solution for off-side rule languages is already given (and is no less of a hack than symbol tables are): count the spaces in the lexer. The only real problem I can imagine with such languages is with programs that generate code in them. >IMHO, people designing concrete syntax for languages should check that >the concrete syntax is LALR(1) [..] This is quite simple to do: >all you need to do is write a grammar which runs through the Unix(tm) >Yacc tool without any ambiguities. Testing for LALR(1) is simple: either it works and you're finished, or your grammar is ambiguous and you have to rewrite it anyway, or it isn't LALR(1) and not ambiguous either, and you're in trouble. Even if you know "all" about LALR(1), you still have to have an enormous amount of specific intuition to be able to eliminate LALR(1) problems. (Quoted from earlier on:) >I am amazed that people ignore all the work that has been done on parsing. Me too. LALR(1) isn't exactly the hottest thing in parsing, is it? -- Eerke Boiten Department of Informatics (STOP Project), K.U.Nijmegen Toernooiveld, 6525 ED Nijmegen, The Netherlands Tel. +31-80-652236. Email: eerke@cs.kun.nl