Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!samsung!munnari.oz.au!goanna!ok From: ok@goanna.oz.au (Richard O'keefe) Newsgroups: comp.lang.c Subject: Re: Why nested comments not allowed? Summary: they don't work Message-ID: <2897@goanna.oz.au> Date: 21 Feb 90 04:00:15 GMT References: <236100027@prism> Organization: Comp Sci, RMIT, Melbourne, Australia Lines: 54 In article <236100027@prism>, ly@prism.TMC.COM writes: > I'm just curious to know why nested comments are not allowed in many > languages. To start with, some languages _do_ allow them. For example, Common Lisp has #|...comment...|# which nests. There is the obvious point that nested structures of any kind are not definable with regular expressions (and LEX is not the _only_ r.e. tool around, you know). But the *real* reason is that they simply don't work. Imagine a Pascal dialect which admits nested comments. Comments are used to include natural-language text in the program, so we have to allow things like {This is a `quotation'} But program text may legitimately contain fred := '}'; and when we comment it out by wrapping {..} around it we get { fred := '}'; } In order to handle this code fragment, we mustn't take the "}" following the "'" as a closing bracket, but in order to handle the text fragement we *must* take the "}" following the "'" as a closing bracket. We could easily arrange for comments to be viewed as unstructured except for comment brackets being significant. That's what Common Lisp does, and it's what's usually done when nested comments are provided. But that means that wrapping a *valid* statement in comment brackets may produce *invalid* text. We could easily arrange for comments to be viewed as sequences of programming language tokens. Pop-2 did that. Commenting out code fragments would work well done that way, but you'd have trouble with text. In fact Pop-2 programmers used to have to write comment `This is text written as a string so that it can' `be included in a comment without being parsed as' `Pop tokens'; Not good. We conclude that there are two *different* things: (a) marking a sequence of tokens so that the processor will behave as though those tokens were not present (b) including text which does not follow the lexical rules of the programming language in question In C, we use #if/#endif (which nest!) for (a) and /**/ for (b). Another clue that (a) and (b) are different is that there is usually some _reason_ why the sequence of tokens in (a) is not to be included, but no reason is needed for (b) because non-token text could _never_ have been part of the program proper. This also suggests that it might be a good idea to explicitly label type (a) "comments" with the reason. In C, for example, we would have #if DEBUGGING .... #endif