Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!ncar!gatech!mcnc!duke!crm
From: crm@duke.cs.duke.edu (Charlie Martin)
Newsgroups: comp.software-eng
Subject: Re: COCOMO
Message-ID: <677360215@macbeth.cs.duke.edu>
Date: 19 Jun 91 19:36:57 GMT
References: <1991Jun18.033606.1362@netcom.COM> <677256043@macbeth.cs.duke.edu> <1991Jun18.214011.17765@netcom.COM>
Distribution: comp
Organization: Duke University Computer Science Dept.; Durham, N.C.
Lines: 134


Okay, I *think* Jim and I are *laregly* in agreement, but I think there
are a couple of essential points to be clarified.  Here goes:

In article <1991Jun18.214011.17765@netcom.COM> jls@netcom.COM (Jim Showalter) writes:
>crm@duke.cs.duke.edu (Charlie Martin) writes:
>
>>>>(1) Empirically, in any organization, man-months per 1000 lines of code
>>>>(K SLOC) is roughly constant, no matter what language or environment is
>>>>used.  So, we can always assume that effort in man-months is
>>>>proportional to size in KSLOC.
>
>[I respond that this is absurd, to which Mr. Martin responds:]
>
>>I always hate it when someone says something is "prima facie absurd"
>>like this.  First of all, notice the claim is not *everyone*, but that
>>*within an organization* the productivity is roughly constant.
>
>I think I see the problem here. I'm probably not parsing your statement
>the way you are writing it. When I read the original post, it seems
>to read--to my parser at least--as follows: "It doesn't make any 
>difference what language or environment your organization uses, because
>you'll always produce the same amount of code in the same amount of time,
>regardless". This was the statement I was claiming was absurd, because,
>of course, it IS (is there anybody out there who thinks this statement
>is NOT absurd?). 

Okay, here goes:

(1) given the exact same environment -- language, operating system, etc
-- the average productivity of a group of programmers is likely to
remain nearly constant.

(2)  Across lanaguage environments, the number of source lines of code
tends to remain constant.  I know this seems counter-intuitive, but
it's well supported empirically: people average around 10 source lines
of code per total man days worked on the project.  Your milage may
vary, but the basic relationship does not.  Also, the constant there
does vary between organizations and within organizations -- factors of
2 or more average aren't uncommon -- but given the wide variation
between programmers it isn't clear whether this is the effect of
differences in management and environment, or just the effect of
having randomly assigned several high-speed programmers to one
project.  Unfortunately, it's also the sort of thing that's hard to
control for and that makes experimentation expensive.

This also looks like I am indeed insisting on the statement you say is
absurd.  I don't have any trouble with that -- the empirical evidence is
on my side.

(3) The last sentence says that effort in man months is proportional to
size in KSLOC.  This statement carefully avoids any statement about the
constant of proportionality -- but it does say that no matter how fast
you code, the time you take is proportional to the size of the final
code.

>In your followup response, you rephrase things so that
>I get the idea you are actually saying something else entirely, in which
>case we probably aren't arguing, just failing to communicate.
>
>>But secondly, "claims of empiricism" cannot be just unceremoniously
>>dumped.  The fact is that this relationship held over something like 400
>>projects, in dozens of different environments, with languages from
>>assembler to the best HLL's of the time.  Where is your counter
>>evidence?  How was it measured?
>
>I've witnessed organizations producing code at anywhere from 5 lines per
>week per programmer to about five hundred lines per week per programmer, which
>is a three order of magnitude spread. But since I'm not sure what your
>original claim was, I'm not sure if this refutes it or is completely
>unrelated to it.

We need to control for a bunch of things before I'd know either, for
example the effects of size and problem domain.  That's why we need more
than anecdotal statements here.  

If you are seeing people coding new products in an embedded program
that is in the area of 1 million SLOC and getting a productivity of
around 12.5 SLOC per man-hour (=500 SLOC /man-week), I'd like to know
what the environment is like.  on the other hand, if they are writing
seperated COBOL programs that communicate only through accessing a
well-defined collection of files, and the programs average around 10
000 SLOC, but the productivity is around 0.125 SLOC per man-hour(=5
SLOC/man-week), then I think something is wrong.

>
>>and we can see that at some point the effects of increasing size
>>dominate the effects of anything that affects just the constant (so long
>>as d > 1.)  One supposition I've made is that the difference between
>>programming-in-the-small and programming-in-the-large is that large
>>scale programming is when scale dominates in this equation.
>
>>[ This is my Discovery Of the Week.  I can't decide if I think it's
>>significant or not.]
>
>I agree that this is a good Discovery of the Week, 
Thanks!
>and it is the fact
>that we so strongly agree here that leads me to believe that we probably
>agree on the earlier stuff too, if we can just get our communications
>ungarbled.

I guess that's something for you to decide.

>
>>You didn't read down far enough.  The relation has not one but two
>>factors that can be set or chosen to suit differences in the
>>environment.  In the Intermediate and Advanced COCOMO models there are a
>>number of factors that model things like language chosen, environment,
>>use of a methodology, etc.  Basic COCOMO does not take these into
>>account, and as you say is inherently inaccurate.
>
>Ah, see, then we DO agree. So much for this thread...

Don't despair, there's still room for argument.

The part of what I said that you deleted is what I think is the most
interesting part: not that basic COCOMO without all the weighting
factors is inaccurate, but that it is suprisingly *accurate* given its
limitations.  That makes me think the basic relationship is *very*
strong, since it is good within a factor of two 60 % of the time with
very sloppy constants.  Given a well-instrumented environment and some
time working one's own regressions for the various constants, I think it
can be made *quite* accurate.  Even without this, it's not a bad
approximation for back-of-the-envelope calculations.

And the DOTW above suggests that no matter the choice of language and
environment, the effects of scale will eventually dominate as the
program gets bigger.


-- 
	 Charlie Martin (...!mcnc!duke!crm, crm@cs.duke.edu)
	    13 Gorham Place/Durham, NC 27705/919-383-2256