Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!cs.utexas.edu!uwm.edu!linac!att!princeton!pucc!EGNILGES
From: EGNILGES@pucc.Princeton.EDU (Ed Nilges)
Newsgroups: comp.software-eng
Subject: Re: Counting semicolons (was: Re: WANTED: "C" code line counter program)
Message-ID: <12644@pucc.Princeton.EDU>
Date: 29 Mar 91 01:00:22 GMT
References: <RICHARD.91Mar12184440@dompap.iesd.auc.dk> <1991Mar15.132757.6883@comm.wang.com> <4196@zaphod.UUCP> <7547@idunno.Princeton.EDU> <1991Mar28.091725.17574@hollie.rdg.dec.com>
Reply-To: EGNILGES@pucc.Princeton.EDU
Organization: Princeton University, NJ
Lines: 79
Disclaimer: Author bears full responsibility for contents of this article

In article <1991Mar28.091725.17574@hollie.rdg.dec.com>, jch@hollie.rdg.dec.com (John Haxby) writes:
>
>Actually, both of these contain six "terms" (I'm
>not sure what C calls these primative units, that's
>why I've put it in quotes).  In both cases the terms
>are
>
>        index = 0
>        count = 0
>        index < limit
>        index++
>        condition
>        count++

"Terms" don't form a well-defined syntactic category.  C has expres-
sions and C has statements.  C doesn't have "terms".

You could try to define a "term" as a "simple" expression consisting
of no subexpressions but then how many of these "terms" does "a+b*c"
have?  One?  Two? One and a half?

You could build a shaky metric on expression complexity and work is
available in this area, but here the metric would be very, very
complex.  You'd have to take into account differences in psycho-
logical complexity of operators: is division more complicated than
addition?
>
>If you want a metric that doesn't depend (much) on
>coding style, then I suggest you count terms rather
>than statements. The metric has inaccuracies when
>you consider statements like
>
>        index = count = 0;
>
>which is still one term and

One and a half?

>
>        index++ < limit;
>
>which is also one term.
>
>There is no absolute metric for counting useful
>chunks of code (what does useful mean?), it's better
>to choose a metric and know what the limitations of
>that metric are than to spend weeks writing some tool
>(ie a parser) that counts chunks and then falls over
>in a heap because you have to run it through the C
>pre-processor first.

Huh?

Run what through the C preprocessor first?  I don't think you'd
want to run the measured code through the PP first because your
engineers will deal with the unpreprocessed code in nearly all
cases.

There is no absolute metric but there are lousy metrics and
good metrics.  As to spending weeks, some compilers will produce
counts of BNF statements and you could awk/sed/grep the output
of such a compiler to strip everything but the count, and voila
there's your statement count.  There should be a readily available
skeleton C compiler in the public domain consisting of nothing
more than a parser and a lexxer that people could modify to
build good, solid, syntax-driven tools.

Your example of "term" is a good engineer trying to do mathematics
without having the time to do mathematics.  Mathematicians and
language designers have already cooked up the syntax of C in
Backus-Naur Form and devising sed/awk tools that ignore this
work is labor that may seem efficient but which actually wastes
the work done by the person who first formalized the syntax of
C.
+--------------------------------+ Edward G. Nilges
| Child support, tax-deductible  | Princeton University
| to payer AND receiver: an idea | Information Center
| whose time has come.           | Bitnet: EGNILGES@PUCC
+--------------------------------+ (609) 258-2985