Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!ncar!gatech!mcnc!duke!crm From: crm@duke.cs.duke.edu (Charlie Martin) Newsgroups: comp.software-eng Subject: Re: COCOMO Message-ID: <677360215@macbeth.cs.duke.edu> Date: 19 Jun 91 19:36:57 GMT References: <1991Jun18.033606.1362@netcom.COM> <677256043@macbeth.cs.duke.edu> <1991Jun18.214011.17765@netcom.COM> Distribution: comp Organization: Duke University Computer Science Dept.; Durham, N.C. Lines: 134 Okay, I *think* Jim and I are *laregly* in agreement, but I think there are a couple of essential points to be clarified. Here goes: In article <1991Jun18.214011.17765@netcom.COM> jls@netcom.COM (Jim Showalter) writes: >crm@duke.cs.duke.edu (Charlie Martin) writes: > >>>>(1) Empirically, in any organization, man-months per 1000 lines of code >>>>(K SLOC) is roughly constant, no matter what language or environment is >>>>used. So, we can always assume that effort in man-months is >>>>proportional to size in KSLOC. > >[I respond that this is absurd, to which Mr. Martin responds:] > >>I always hate it when someone says something is "prima facie absurd" >>like this. First of all, notice the claim is not *everyone*, but that >>*within an organization* the productivity is roughly constant. > >I think I see the problem here. I'm probably not parsing your statement >the way you are writing it. When I read the original post, it seems >to read--to my parser at least--as follows: "It doesn't make any >difference what language or environment your organization uses, because >you'll always produce the same amount of code in the same amount of time, >regardless". This was the statement I was claiming was absurd, because, >of course, it IS (is there anybody out there who thinks this statement >is NOT absurd?). Okay, here goes: (1) given the exact same environment -- language, operating system, etc -- the average productivity of a group of programmers is likely to remain nearly constant. (2) Across lanaguage environments, the number of source lines of code tends to remain constant. I know this seems counter-intuitive, but it's well supported empirically: people average around 10 source lines of code per total man days worked on the project. Your milage may vary, but the basic relationship does not. Also, the constant there does vary between organizations and within organizations -- factors of 2 or more average aren't uncommon -- but given the wide variation between programmers it isn't clear whether this is the effect of differences in management and environment, or just the effect of having randomly assigned several high-speed programmers to one project. Unfortunately, it's also the sort of thing that's hard to control for and that makes experimentation expensive. This also looks like I am indeed insisting on the statement you say is absurd. I don't have any trouble with that -- the empirical evidence is on my side. (3) The last sentence says that effort in man months is proportional to size in KSLOC. This statement carefully avoids any statement about the constant of proportionality -- but it does say that no matter how fast you code, the time you take is proportional to the size of the final code. >In your followup response, you rephrase things so that >I get the idea you are actually saying something else entirely, in which >case we probably aren't arguing, just failing to communicate. > >>But secondly, "claims of empiricism" cannot be just unceremoniously >>dumped. The fact is that this relationship held over something like 400 >>projects, in dozens of different environments, with languages from >>assembler to the best HLL's of the time. Where is your counter >>evidence? How was it measured? > >I've witnessed organizations producing code at anywhere from 5 lines per >week per programmer to about five hundred lines per week per programmer, which >is a three order of magnitude spread. But since I'm not sure what your >original claim was, I'm not sure if this refutes it or is completely >unrelated to it. We need to control for a bunch of things before I'd know either, for example the effects of size and problem domain. That's why we need more than anecdotal statements here. If you are seeing people coding new products in an embedded program that is in the area of 1 million SLOC and getting a productivity of around 12.5 SLOC per man-hour (=500 SLOC /man-week), I'd like to know what the environment is like. on the other hand, if they are writing seperated COBOL programs that communicate only through accessing a well-defined collection of files, and the programs average around 10 000 SLOC, but the productivity is around 0.125 SLOC per man-hour(=5 SLOC/man-week), then I think something is wrong. > >>and we can see that at some point the effects of increasing size >>dominate the effects of anything that affects just the constant (so long >>as d > 1.) One supposition I've made is that the difference between >>programming-in-the-small and programming-in-the-large is that large >>scale programming is when scale dominates in this equation. > >>[ This is my Discovery Of the Week. I can't decide if I think it's >>significant or not.] > >I agree that this is a good Discovery of the Week, Thanks! >and it is the fact >that we so strongly agree here that leads me to believe that we probably >agree on the earlier stuff too, if we can just get our communications >ungarbled. I guess that's something for you to decide. > >>You didn't read down far enough. The relation has not one but two >>factors that can be set or chosen to suit differences in the >>environment. In the Intermediate and Advanced COCOMO models there are a >>number of factors that model things like language chosen, environment, >>use of a methodology, etc. Basic COCOMO does not take these into >>account, and as you say is inherently inaccurate. > >Ah, see, then we DO agree. So much for this thread... Don't despair, there's still room for argument. The part of what I said that you deleted is what I think is the most interesting part: not that basic COCOMO without all the weighting factors is inaccurate, but that it is suprisingly *accurate* given its limitations. That makes me think the basic relationship is *very* strong, since it is good within a factor of two 60 % of the time with very sloppy constants. Given a well-instrumented environment and some time working one's own regressions for the various constants, I think it can be made *quite* accurate. Even without this, it's not a bad approximation for back-of-the-envelope calculations. And the DOTW above suggests that no matter the choice of language and environment, the effects of scale will eventually dominate as the program gets bigger. -- Charlie Martin (...!mcnc!duke!crm, crm@cs.duke.edu) 13 Gorham Place/Durham, NC 27705/919-383-2256