Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!thunder.mcrcim.mcgill.edu!snorkelwacker.mit.edu!apple!julius.cs.uiuc.edu!zaphod.mps.ohio-state.edu!ub!uhura.cc.rochester.edu!rochester!pt.cs.cmu.edu!LAMBDA.ERGO.CS.CMU.EDU!dtarditi
From: dtarditi@CS.CMU.EDU (David Tarditi)
Newsgroups: comp.lang.functional
Subject: Re: "Off-side rule"
Message-ID: <DTARDITI.91Jan22185718@LAMBDA.ERGO.CS.CMU.EDU>
Date: 22 Jan 91 23:57:18 GMT
References: <22307@rouge.usl.edu> <1991Jan10.111559.12440@odin.diku.dk>
	<ACHA.91Jan11151418@DRAVIDO.CS.CMU.EDU> <293@smds.UUCP>
	<7415@vanuata.cs.glasgow.ac.uk>
Sender: dtarditi@LAMBDA.ERGO.CS.CMU.EDU
Organization: School of Computer Science, Carnegie Mellon University
Lines: 68
In-reply-to: kh@cs.glasgow.ac.uk's message of 14 Jan 91 19:48:38 GMT


I am astonished to find people defending the off-side rule.  Parsing
is a solved problem.  Why do designers of semantically "clean" languages
insist on botching syntax ?

We could argue forever about whether making white-space significant 
is in good taste.  Suffice it to say that most real programming languages
in widespread use do not make white-space significant.  Maybe they just
have not seen the light of "conciseness and legibility", or perhaps
from previous experience they wanted to get rid of endless, annoying
bugs which are difficult to find.   It is bad enough having to match up
delimiters in long programs without having to worry about indentation.

A significant argument against the off-side rule is that it makes 
languages difficult to parse.  The modern approach to parsing is to define
the concrete syntax of a language using BNF and regular expressions.
The BNF for the language should be defined so that the language can
parsed using an LALR(1) parser.  This allows the compiler writer to use
software tools which generate efficient parsers and lexers, and get on with
the more important part of the work (generating correct, efficient code).
The Lex and Yacc Unix(tm) tools exemplify the approach.

The concrete syntax of languages using the off-side rule cannot be defined
using context-free languages or regular expressions. The reason, simply put,
is that these languages possess only a limited ability to "count".
Consider the following language (where c^k denotes k occurrences of
character c):

       { s | there exists k such that s= c^kxc^kyc^kz}

This, I argue, is a simple example of a program using the off-side
rule.  Let c be a space character.  Then we have k spaces, followed
by some phrase, followed by k spaces, etc.  I claim that this language
is not context-free. I don't offer a proof of this here, but an application
of the pumping lemma for context-free languages should suffice.

Thus, it is impossible to parse this language using modern techniques
unless we resort to hacks.  Suggested hacks include adding a preprocessor
specifically to deal with off-side rule or altering the lexer to count
spaces.

I am amazed that people ignore all the work that has been done on
parsing.  The formalization of syntax is considered by most people
to be a significant accomplishment: the formalism is useful to programmers
using the languages and compiler writers alike.

IMHO, people designing concrete syntax for languages should check that 
the concrete syntax is LALR(1), which ensures that mechanical tools for parser
generation can be used if they exist for whatever language you are
writing your compiler in.  This is quite simple to do: all you need to do
is write a grammar which runs through then Unix(tm) Yacc tool without any
ambiguities.
   
   David Tarditi
   tarditi@cs.cmu.edu

	                              
--

   David Tarditi                School of Computer Science
   tarditi@cs.cmu.edu           Carnegie Mellon University
	                        Pittsburgh, PA 15213