Path: utzoo!mnetor!uunet!husc6!mailrus!ames!killer!elg From: elg@killer.UUCP (Eric Green) Newsgroups: comp.lang.misc Subject: Re: Block Closure (was Re: FOR loops) Message-ID: <3925@killer.UUCP> Date: 25 Apr 88 06:46:43 GMT References: <918@rlgvax.UUCP> <2400015@otter.hple.hp.com> <11532@shemp.CS.UCLA.EDU> Reply-To: elg@killer.UUCP (Eric Green) Organization: The Unix Connection, Dallas Lines: 59 In article <11532@shemp.CS.UCLA.EDU> gast@lanai.UUCP (David Gast) writes: >In article <2400015@otter.hple.hp.com> esh@otter.hple.hp.com (Sean Hayes) writes: >> >>>The problem with the idea mentioned in other postings about using indentation >>>to represent the control structure is that it does not have a clean formalism >>>to describe the parser. So move it to the lexer. E.g. have your lexer generate a "begin" or "end" block open/close based upon changes in indentation. Sheesh. The parser doesn't have to do EVERYTHING, you know.... if the lexer can strip out whitespace and comments, much simplifying the parser, then it can certainly handle a simple thing like converting whitespaces into opening/closing symbols. Although I do admit that it would not be easily formalized as regular expressions -- the question of whether this particular whitespace is an open, close, or null, depends upon the state of the last whitespace. Big deal. Single line in Lex, to match the whitespaces at the start of the line (including none), count them, compare them to last count, and return result accordingly. Most lexers are hand-built anyhow, so it doesn't really matter. Except as a matter of principle to purists everwhere, who shudder at actions that they cannot easily model with their current formalisms. >>>syntax of the language will be easy to parse. Problems include: handling >>>of multi line expressions and defining default tab stops. I have used a language that uses indentation as its block denotation (Promal). It gets ugly, folks. That, instead of the formalism objections, should be what scratches the idea entirely. The enforced indentation style of Promal, for example, makes procedure headers blend in with the declaration of local variables -- a real nightmare for someone perusing a printout, especially since comments are expected to start in the current column, too (meaning you have to set up a line of dashes before a comment, in order to pick the comment out of the surrounding program text). Not to mention that I might want to indent some things differently from others, to make them stand out, and other things of that sort -- which would be impossible even with a better indention scheme. Then continuations... would need a special "trailer" character (maybe "\"?) to indicate that this is not EOL (= end-of-statement in this sort of language), then the lexer would have to shift gears and NOT count spaces at the beginning of the next line. Again, a pain, both for implementer, and for the poor programmer. But still quite possible. If it was worthwhile. Which it isn't. Free-form languages simply offer too many advantages to go back to fixed-format languages (COBOL!!! GARGH!). And, since the parser is handling BEGIN/END pairs instead of indentation, we have a dichotomy between what is seen, and what is parsed. Might as well just handle BEGIN/END pairs in the first place. Ah well. Sorry to burst the bubble of all you poor earnest academicians trying to cope with indentation in grammars. But, generally, questions of white space etc. should be handled by the lexer, possibly generating symbols to feed to the parser -- and possibly not. A lexical issue, not a parsing issue. Now, devising a suitable formalism to handle lexical issues like THAT may involve quite a bit of work. But all us poor grunts out here will let the brains handle that, and relax with the knowledge that it can all be handled with a few lines of LEX or a couple of subroutines in "C". -- Eric Lee Green elg@usl.CSNET Snail Mail P.O. Box 92191 ihnp4!killer!elg Lafayette, LA 70509 "Is a dream a lie that don't come true, or is it something worse?"