Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!usc!cs.utexas.edu!uwm.edu!linac!att!princeton!pucc!EGNILGES From: EGNILGES@pucc.Princeton.EDU (Ed Nilges) Newsgroups: comp.software-eng Subject: Re: Counting semicolons (was: Re: WANTED: "C" code line counter program) Message-ID: <12644@pucc.Princeton.EDU> Date: 29 Mar 91 01:00:22 GMT References: <1991Mar15.132757.6883@comm.wang.com> <4196@zaphod.UUCP> <7547@idunno.Princeton.EDU> <1991Mar28.091725.17574@hollie.rdg.dec.com> Reply-To: EGNILGES@pucc.Princeton.EDU Organization: Princeton University, NJ Lines: 79 Disclaimer: Author bears full responsibility for contents of this article In article <1991Mar28.091725.17574@hollie.rdg.dec.com>, jch@hollie.rdg.dec.com (John Haxby) writes: > >Actually, both of these contain six "terms" (I'm >not sure what C calls these primative units, that's >why I've put it in quotes). In both cases the terms >are > > index = 0 > count = 0 > index < limit > index++ > condition > count++ "Terms" don't form a well-defined syntactic category. C has expres- sions and C has statements. C doesn't have "terms". You could try to define a "term" as a "simple" expression consisting of no subexpressions but then how many of these "terms" does "a+b*c" have? One? Two? One and a half? You could build a shaky metric on expression complexity and work is available in this area, but here the metric would be very, very complex. You'd have to take into account differences in psycho- logical complexity of operators: is division more complicated than addition? > >If you want a metric that doesn't depend (much) on >coding style, then I suggest you count terms rather >than statements. The metric has inaccuracies when >you consider statements like > > index = count = 0; > >which is still one term and One and a half? > > index++ < limit; > >which is also one term. > >There is no absolute metric for counting useful >chunks of code (what does useful mean?), it's better >to choose a metric and know what the limitations of >that metric are than to spend weeks writing some tool >(ie a parser) that counts chunks and then falls over >in a heap because you have to run it through the C >pre-processor first. Huh? Run what through the C preprocessor first? I don't think you'd want to run the measured code through the PP first because your engineers will deal with the unpreprocessed code in nearly all cases. There is no absolute metric but there are lousy metrics and good metrics. As to spending weeks, some compilers will produce counts of BNF statements and you could awk/sed/grep the output of such a compiler to strip everything but the count, and voila there's your statement count. There should be a readily available skeleton C compiler in the public domain consisting of nothing more than a parser and a lexxer that people could modify to build good, solid, syntax-driven tools. Your example of "term" is a good engineer trying to do mathematics without having the time to do mathematics. Mathematicians and language designers have already cooked up the syntax of C in Backus-Naur Form and devising sed/awk tools that ignore this work is labor that may seem efficient but which actually wastes the work done by the person who first formalized the syntax of C. +--------------------------------+ Edward G. Nilges | Child support, tax-deductible | Princeton University | to payer AND receiver: an idea | Information Center | whose time has come. | Bitnet: EGNILGES@PUCC +--------------------------------+ (609) 258-2985