Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.csd.uwm.edu!mrsvr.UUCP!shoreland.uucp!hallett
From: hallett@shoreland.uucp (Jeff Hallett x4-6328)
Newsgroups: comp.software-eng
Subject: Re: C source lines in file
Message-ID: <895@mrsvr.UUCP>
Date: 18 Aug 89 17:47:23 GMT
References: <35120@ccicpg.UUCP> <16018@vail.ICO.ISC.COM>
Sender: news@mrsvr.UUCP
Reply-To: hallett@shoreland.UUCP (Jeff Hallett x4-6328)
Organization: GE Medical Systems, Milwaukee,  WI
Lines: 69

In article <16018@vail.ICO.ISC.COM> rcd@ico.ISC.COM (Dick Dunn) writes:
>swonk@ccicpg.UUCP (Glen Swonk) writes:
>> Does anyone have a program or a method of determing
>> the number of C source lines in a source file?
>> My assumption is that comments don't count as source
>> lines unless the comment is on a line with code.

In my former job, we came up with a way to measure C lines in a way
that suited us.  The basic approach was to 

	1. Remove all comments
	2. Ensure that there was only 1 "statement" of code per textual line
	  (a stmt here may be a curly brace or null stmt (solitary ;))
	3. Removed all blank lines, braces and ; with no text with them.
	4. Removed all 'do' keywords (they do no work).
	5. Pulled all broken function calls together on one line (ie.
	   where a newline was inserted between parameters to make the
	   call prettier) 
	5. Count the lines which are left.

Granted, this implies some "sanity" on the part of the programmer not
to do some really weird things (like put the ; for a statement on the
line below the statement), but on the whole this procedure (done
mostly with sed scripts) produced what we would have done by hand.

>it's clear you're off on the wrong foot.  A count of source lines is NOT a
>useful measure of program size or complexity.  Incidentally, be careful
>about the difference between size and complexity!
>

Excellent point about size vs. complexity.  However, "size" is a
nebulous term (more below).

>
>I offer two rules about measuring program size/complexity:
>
>1.  Any variant of "source line count" is useless as a measure of the
>program.
>	I've heard countless times the rationalization that "Well, it may
>	not be good, but it's the best we can do."  This is WRONG!  It's
>	worse than no measure at all.  It implies that you have information

I agree that LOC really is a bad measure of productivity, but so are
most of the items listed by Dick in his earlier posting.  Productivity
of a coder is a difficult thing and most methods I've heard of are
really inadequate since I think that writing code is really still more
an art than a science or manufacturing system.  However, LOC is still
a good estimator of cost.  I say this with the caveat that different
s/w houses will have different correlations and that it is still
stongly linked to complexity.  This is why I like methods like Cocomo
which attempt to relate lines produced with various drivers, both
about the nature of the code and programmers involved, to produce
estimates of cost and time.  Also, most of these methods can be
modified to reflect a particular production site.

How one defines "size" I don't think is as important as how
consistently and accurately it can be measured and what it is used
for.  To judge quality of ANY system based on its size alone is
foolhardy and especially to use systems that encourage programmers to
bloat their code are destructive (as Dick points out).  I encourage
Glen to not only check out various software economics books, but also
managerial evaluation and operations research texts to determine
useful ways to utilize what is collected.

--
                Jeffrey A. Hallett, PET Software Engineering
                    GE Medical Systems, W641, PO Box 414
                            Milwaukee, WI  53201
           (414) 548-5173 : EMAIL -  hallett@postron.gemed.ge.com