Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!uunet!comp.vuw.ac.nz!brian From: brian@comp.vuw.ac.nz (Brian Boutel) Newsgroups: comp.lang.functional Subject: Re: "Off-side rule" Message-ID: <1991Jan23.224828.20359@comp.vuw.ac.nz> Date: 23 Jan 91 22:48:28 GMT References: <22307@rouge.usl.edu> <1991Jan10.111559.12440@odin.diku.dk> Sender: news@comp.vuw.ac.nz (News Admin) Reply-To: brian@comp.vuw.ac.nz (Brian Boutel) Organization: Computer Science Dept, Victoria Univ, Wellington, NEW ZEALAND Lines: 134 Nntp-Posting-Host: antrim-hse.comp.vuw.ac.nz Originator: brian@antrim-hse.comp.vuw.ac.nz A number of people have criticised layout rules. For example, In article , dtarditi@CS.CMU.EDU (David Tarditi) writes: |> |> I am astonished to find people defending the off-side rule. Parsing |> is a solved problem. Why do designers of semantically "clean" |> languages |> insist on botching syntax ? |> |> We could argue forever about whether making white-space significant |> is in good taste. Suffice it to say that most real programming |> languages |> in widespread use do not make white-space significant. Maybe they |> just |> have not seen the light of "conciseness and legibility", or perhaps |> from previous experience they wanted to get rid of endless, annoying |> bugs which are difficult to find. It is bad enough having to match |> up |> delimiters in long programs without having to worry about |> indentation. |> Just a reminder that Haskell is *defined* without reference to any offside rule. Grouping uses { and }, and ; is used for separation. It is entirely practicable and reasonable to write Haskell programs this way. But some people like the "mathematical" appearance of programs written in languages with layout conventions, and they like the uncluttered appearance of programs without lots of explicit delimiters. For this reason, The Haskell Report includes an algorithm for converting programs in this style to the Haskell of the definition. I personally think that the delimited style should always be used for program transfer, but the Haskell report provides for portable programs in layout style by defining the tab conventions that should be assumed when calculating indentations. |> A significant argument against the off-side rule is that it makes |> languages difficult to parse. The modern approach to parsing is to define |> the concrete syntax of a language using BNF and regular expressions. |> The BNF for the language should be defined so that the language can |> parsed using an LALR(1) parser. This allows the compiler writer to use |> software tools which generate efficient parsers and lexers, and get on with |> the more important part of the work (generating correct, efficient code). |> The Lex and Yacc Unix(tm) tools exemplify the approach. |> And these tools are, of course, used in implementing Haskell. |> The concrete syntax of languages using the off-side rule cannot be defined |> using context-free languages or regular expressions. The reason, simply put, |> is that these languages possess only a limited ability to "count". |> Consider the following language (where c^k denotes k occurrences of |> character c): |> |> { s | there exists k such that s= c^kxc^kyc^kz} |> |> This, I argue, is a simple example of a program using the off-side |> rule. Let c be a space character. Then we have k spaces, followed |> by some phrase, followed by k spaces, etc. I claim that this language |> is not context-free. I don't offer a proof of this here, but an application |> of the pumping lemma for context-free languages should suffice. |> |> Thus, it is impossible to parse this language using modern techniques |> unless we resort to hacks. Suggested hacks include adding a preprocessor |> specifically to deal with off-side rule or altering the lexer to count |> spaces. |> Even such common language features as requiring declarations to appear before uses make languages not context-free, and if dealing with this by means of symbol table processing is "resorting to hacks", then most language processors are in trouble. |> I am amazed that people ignore all the work that has been done on |> parsing. The formalization of syntax is considered by most people |> to be a significant accomplishment: the formalism is useful to programmers |> using the languages and compiler writers alike. |> |> IMHO, people designing concrete syntax for languages should check that |> the concrete syntax is LALR(1), which ensures that mechanical tools |> for parser |> generation can be used if they exist for whatever language you are |> writing your compiler in. This is quite simple to do: all you need |> to do |> is write a grammar which runs through then Unix(tm) Yacc tool without |> any |> ambiguities. |> There is a big difference between "ignoring all the work that has been done" and using the tools plus other methods when the tools do not go far enough. Let me give another example from Haskell. Most people give ambiguous grammars to Yacc and provide additional disambiguating information, for example, associativity and precedence info for infix operators. A problem with this is that you cannot in Yacc specify operators of different associativities at the same precedence, and yet this may be a necessary feature (e.g. when the operators are defined by different people in separate modules and imported into the same module). Haskell defines what is legal and how it is to be parsed by requiring that infix expressions be fully parenthesised except in a list of cases where normal precedence and associativity considerations make this unnecessary. Now this is still context free, but since the rules cannot be expressed using the Yacc %left and %right syntax, the grammar needed to describe it is very large. So what should be done? a) Change the language? b) Live with the inefficiencies of the big parser that is generated by the big grammar. c) Recognise that Yacc is not a good way to solve this particular problem and use a mixed strategy of LALR(1) grammar/Yacc for the rest of the language and a precedence grammar/parser for infix expressions? The art is to find the right tools for the task at hand, not to reject all tasks for which your favourite tools are not ideal. --brian -- Internet: brian@comp.vuw.ac.nz Postal: Brian Boutel, Computer Science Dept, Victoria University of Wellington, PO Box 600, Wellington, New Zealand Phone: +64 4 721000