Xref: utzoo comp.software-eng:2595 comp.misc:7559
Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!usc!apple!snorkelwacker!mit-eddie!apollo!perry
From: perry@apollo.HP.COM (Jim Perry)
Newsgroups: comp.software-eng,comp.misc
Subject: Re: Coding standards (was Re: Programmer productivity)
Message-ID: <473ae701.20b6d@apollo.HP.COM>
Date: 4 Dec 89 20:21:00 GMT
References: <Nov.18.22.47.26.1989.9685@paul.rutgers.edu> <34796@regenmeister.uucp> <2226@jato.Jpl.Nasa.Gov> <128179@sun.Eng.Sun.COM> <546@sagpd1.UUCP> <4727@netcom.UUCP> <9157@hoptoad.uucp>
Sender: root@apollo.HP.COM
Reply-To: perry@apollo.HP.COM (Jim Perry)
Organization: Hewlett-Packard Apollo Division - Chelmsford, MA
Lines: 115

In article <9157@hoptoad.uucp> tim@hoptoad.UUCP (Tim Maroney) writes:
>In article <??> dopey@sun.UUCP (Can't ya tell by the name) writes:
>>Regarding (c), well documented to me doesn't necessarily mean lots of comments.
>
>Absolutely.  Clear code shouldn't *need* a lot of comments; a
>programmer should be able to read it and understand what's going on
>from the routine names, the variable names, and the flow of control,
>with just a few added comments if any.  A lot of extraneous comments
>about things that would be perfectly clear just from reading the code
>actually damages code readability; the control structures become much
>harder to follow.
>
>There are a lot of people who adhere to an rule that more comments are
>always better.  I worked with a piece of code like that this year.  I
>couldn't make heads or tails out of the commented version, which wound
>up a few hundreds of lines.  So I sat down and ruthlessly stripped out
>all the comments, and when the code was reduced to a few tens of lines,
>I then reduced the control structures to the simpler forms which
>emerged when you could actually start to see the forest for the trees.
>After that, it became comprehensible.
>
>In summary:  Clear code is far more important than extensive comments.

Clear code and clear comments are both important.  As you observe, it's
quite possible to obfuscate a program in any number of ways.  However,
this example doesn't say much, other than that you were presented with
a small program you didn't understand (presumably because it was badly
written/commented), and by extensively editing it, and substantially
rewriting a significant percentage of it, came to understand it.  Let's
assume that you've now rearranged the code to the optimal C language
(again, an assumption) description of the solution, but no comments.  I
submit that I can then pass over that file adding comments, and by so doing
produce an even better program.

My definition of "even better"?  I assign an arbitrary engineer, who's
never seen that piece of code (or who last worked on it six months ago,
effectively the same thing), to make some functional modification to the
program.  The sooner the correct new solution is reached, the better the
(original) program.  

>>  Much better than sitting down with 100K lines of code and going through
>>it with a new hire.  'Course, none of this ever gets written until the
>>release goes out...
>
>Again, I agree.  External documentation is very useful; far more so than
>most code comments.

Again, you're throwing out the baby with the bathwater.  External
documentation has a fundamental flaw alluded to by dopey (no offense): it's
not generally there, and it's out of date.  "Most code comments" are also
missing or out of date, but only because most code is poorly documented.
As Fred Brooks says in The Mythical Man-Month:

     "[external] Program documentation is notoriously poor, and its
    maintenance is worse.  Changes made in the program do not promptly,
    accurately, and invariably appear in the paper."
     "The solution, I think, is to merge the files, to incorporate the
    documentation into the source program.  This is at once a powerful
    incentive toward proper maintenance, and an insurance that the
    documentation will always be handy to the program user.  Such programs
    are called *self_documenting*".

The proper rule, of course, is not that more comments are always better,
but that sufficient comments are always better.  In your example there were
presumably too many comments, but then the code was apparently not clearly
written either.  It is true that what Knuth calls a literate programmer
must have both the skill of coding, and that of documenting.  All
programmers are in effect technical writers, documenting their work for
other programmers who will see it/work on it.  Not all current programmers
excel at both of these skills, but it is a goal to aspire to.  

>>  Much better than sitting down with 100K lines of code and going through
>>it with a new hire.

Well, of course this is the heart of the matter.  A few-hundred or few-ten
line program tells us very little about real life software engineering
situations.  Actually, if the code is properly self-documenting, then the
new hire *can* just sit down with the code and learn from the code itself. 
Documentation, like code, is hierarchical.  At the beginning of each
program, library, whatever, is a broad overview of that unit.  More
specific comments would be associated with modules, functions, algorithms,
etc.

For instance, let's say I've been asked to change the memory allocation
implementation of a moderately large program I've never seen before.  From
the documentation of the program I determine generally what it does and
what sort of data it deals with, and further that it's internally broken
down into twelve modules, one of which deals with storage allocation.  In
that module's primary .c file is a description of the general memory model,
a breakdown of the operations on that memory (functions in the module), and
perhaps a summary of what the cost and benefit of that model are compared
to likely alternatives.  At each subsidiary function the particular
algorithms used are described, potential pitfalls, potential interaction
with other functions.  Within a function the variables are described, and
the high points of the algorithm, such as potential trouble sites for
concurrency, etc.

There's not much time overhead in generating this documentation, assuming a
basic competence at technical writing to one's own level.  At design time
most of this information is probably either already written down or on the
forefront of the programmer's brain (I often design code by writing the 
documentation).  This sort of information *can't* easily be reconstructed
from reading C code.  ("now WHY was I cocky enough to code this loop
without explicitly guarding against interrupts?")  I experienced an
epiphany once when I realized that for the fourth time in two years I was
drawing little linked-list boxes-and-lines to prove to myself that a list
handling function was correct in all cases.  I put that diagram into the
code (and subsequently did in fact refer to it a few times on later
occasions, saving myself significant time).  I hope if subsequent
maintainers have had occasion to visit that code they benefit from it, but
it doesn't really matter, in this case, I've already benefitted myself.
-
Jim Perry   perry@apollo.com    HP/Apollo, Chelmsford MA
This particularly rapid unintelligible patter 
isn't generally heard and if it is it doesn't matter.


Brought to you by Super Global Mega Corp .com