Path: utzoo!attcan!uunet!spool2.mu.edu!mips!pacbell.com!ucsd!sdcc6!beowulf!djohnson
From: djohnson@beowulf.ucsd.edu (Darin Johnson)
Newsgroups: comp.lang.functional
Subject: Re: "Off-side rule"
Message-ID: <15576@sdcc6.ucsd.edu>
Date: 13 Jan 91 21:24:00 GMT
References: <1991Jan11.100048.3121@odin.diku.dk> <ACHA.91Jan11151418@DRAVIDO.CS.CMU.EDU> <27854.27905aa5@kuhub.cc.ukans.edu>
Sender: news@sdcc6.ucsd.edu
Organization: CSE Dept., UC San Diego
Lines: 35
Nntp-Posting-Host: beowulf.ucsd.edu

>In article <ACHA.91Jan11151418@DRAVIDO.CS.CMU.EDU>, acha@CS.CMU.EDU (Anurag Acharya) writes:
>: 
>: What is the justification for this "off-side" rule ? The idea of whitespace 
>: having semantics is a potential source of inscrutable bugs and, frankly 
>: speaking, seems to go against the grain of modern programming language
>: design. The concrete syntax of such a language would no longer be 
>: context-free,
>: let alone LR(1)/LL(1). In fact, I am hard pressed to conceptualize an
>: efficient tokenizing algorithm for such languages. 

When you look at how strict and unforgiving Occam is towards spacing,
the off-side rule is rather benign.  However, there is the major "gotcha"
in both languages - tabs count as one character.  And unfortunately, back
when I used vi, tabs would be inserted automatically and occam would give
some meaningless error message that tooks hours to figure out.

As far as justifying this, it adds to readability.  One of the "goals" of
functional programming languages is to be able to have programs look
like formal mathematical functions.  Block constructs detract from
the readability somewhat, especially if you must add begin/end because
{}, (), etc are already used.  Probably a big reason is that it looks
nice esthetically, without lots of filler tokens and symbols.

Parsing isn't that bad at all.  In fact, the lexical analyzer can handle it
all, and the parser need no nothing about spacing.  The method I used in a
project was to insert a "begin" token whenever the beginning of a block was
found (like just after =), and also push onto a stack the current position
in the line.  Then whenever reading a new line, if the first symbol was
less than the position saved on the top of the stack, "end"'s were inserted
into the token stream.  It was very simple to add, and the parser always saw
begin/end markers and was kept happy.  [of course, you have to go to reading
line by line, but this was done anyway so error messages had line numbers]
-- 
Darin Johnson
djohnson@ucsd.edu