Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!uunet!comp.vuw.ac.nz!brian
From: brian@comp.vuw.ac.nz (Brian Boutel)
Newsgroups: comp.lang.functional
Subject: Re: "Off-side rule"
Message-ID: <1991Jan23.224828.20359@comp.vuw.ac.nz>
Date: 23 Jan 91 22:48:28 GMT
References: <22307@rouge.usl.edu> <1991Jan10.111559.12440@odin.diku.dk><ACHA.91Jan11151418 <DTARDITI.91Jan22185718@LAMBDA.ERGO.CS.CMU.EDU>
Sender: news@comp.vuw.ac.nz (News Admin)
Reply-To: brian@comp.vuw.ac.nz (Brian Boutel)
Organization: Computer Science Dept, Victoria Univ, Wellington, NEW ZEALAND
Lines: 134
Nntp-Posting-Host: antrim-hse.comp.vuw.ac.nz
Originator: brian@antrim-hse.comp.vuw.ac.nz


A number of people have criticised layout rules. For example,

In article <DTARDITI.91Jan22185718@LAMBDA.ERGO.CS.CMU.EDU>,
dtarditi@CS.CMU.EDU (David Tarditi) writes:
|> 
|> I am astonished to find people defending the off-side rule.  Parsing
|> is a solved problem.  Why do designers of semantically "clean"
|> languages
|> insist on botching syntax ?
|> 
|> We could argue forever about whether making white-space significant 
|> is in good taste.  Suffice it to say that most real programming
|> languages
|> in widespread use do not make white-space significant.  Maybe they
|> just
|> have not seen the light of "conciseness and legibility", or perhaps
|> from previous experience they wanted to get rid of endless, annoying
|> bugs which are difficult to find.   It is bad enough having to match
|> up
|> delimiters in long programs without having to worry about
|> indentation.
|>

Just a reminder that Haskell is *defined* without reference to any
offside rule. Grouping uses { and }, and ; is used for separation. 

It is entirely practicable and reasonable to write Haskell programs this way.
But some people like the "mathematical" appearance of programs written
in languages with layout conventions, and they like the uncluttered
appearance of programs without lots of explicit delimiters. For this
reason, The Haskell Report includes an algorithm for converting programs
in this style to the Haskell of the definition. 

I personally think that the delimited style should always be used for
program transfer, but the Haskell report provides for portable programs
in layout style by defining the tab conventions that should be assumed
when calculating indentations.


|> A significant argument against the off-side rule is that it makes 
|> languages difficult to parse.  The modern approach to parsing is to define
|> the concrete syntax of a language using BNF and regular expressions.
|> The BNF for the language should be defined so that the language can
|> parsed using an LALR(1) parser.  This allows the compiler writer to use
|> software tools which generate efficient parsers and lexers, and get on with
|> the more important part of the work (generating correct, efficient code).
|> The Lex and Yacc Unix(tm) tools exemplify the approach.
|> 

And these tools are, of course, used in implementing Haskell.


|> The concrete syntax of languages using the off-side rule cannot be defined
|> using context-free languages or regular expressions. The reason, simply put,
|> is that these languages possess only a limited ability to "count".
|> Consider the following language (where c^k denotes k occurrences of
|> character c):
|> 
|>        { s | there exists k such that s= c^kxc^kyc^kz}
|> 
|> This, I argue, is a simple example of a program using the off-side
|> rule.  Let c be a space character.  Then we have k spaces, followed
|> by some phrase, followed by k spaces, etc.  I claim that this language
|> is not context-free. I don't offer a proof of this here, but an application
|> of the pumping lemma for context-free languages should suffice.
|> 
|> Thus, it is impossible to parse this language using modern techniques
|> unless we resort to hacks.  Suggested hacks include adding a preprocessor
|> specifically to deal with off-side rule or altering the lexer to count
|> spaces.
|> 

Even such common language features as requiring declarations to appear
before uses make languages not context-free, and if dealing with this by
means of symbol table processing is "resorting to hacks", then most
language processors are in trouble.


|> I am amazed that people ignore all the work that has been done on
|> parsing.  The formalization of syntax is considered by most people
|> to be a significant accomplishment: the formalism is useful to programmers
|> using the languages and compiler writers alike.
|> 
|> IMHO, people designing concrete syntax for languages should check that 
|> the concrete syntax is LALR(1), which ensures that mechanical tools
|> for parser
|> generation can be used if they exist for whatever language you are
|> writing your compiler in.  This is quite simple to do: all you need
|> to do
|> is write a grammar which runs through then Unix(tm) Yacc tool without
|> any
|> ambiguities.
|>    

There is a big difference between "ignoring all the work that has been
done"  and using the tools plus other methods when the tools do not go
far enough. 

Let me give another example from Haskell. Most people give ambiguous
grammars to Yacc and provide additional disambiguating information, for
example, associativity and precedence info for infix operators. A
problem with this is that you cannot in Yacc specify operators of
different associativities at the same precedence, and yet this may be a
necessary feature (e.g. when the operators are defined by different
people in separate modules and imported into the same module).

Haskell defines what is legal and how it is to be parsed by requiring
that infix expressions be fully parenthesised except in a list of cases
where normal precedence and associativity considerations make this
unnecessary. Now this is still context free, but since the rules cannot
be expressed using the Yacc %left and %right syntax, the grammar needed
to describe it is very large. 

So what should be done? 
a) Change the language? 
b) Live with the inefficiencies of the big parser that is generated by
the big grammar.
c) Recognise that Yacc is not a good way to solve this particular
problem and use a mixed strategy of LALR(1) grammar/Yacc for the rest of
the language and a precedence grammar/parser for infix expressions? 

The art is to find the right tools for the task at hand, not to reject
all tasks for which your favourite tools are not ideal.

--brian
-- 
Internet: brian@comp.vuw.ac.nz
Postal: Brian Boutel, Computer Science Dept, Victoria University of Wellington,
        PO Box 600, Wellington, New Zealand
Phone: +64 4 721000