Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!uunet!bellcore!grumpy!bytor
From: bytor@grumpy.Berkeley.EDU (Ross Huitt)
Newsgroups: comp.software-eng
Subject: Re: OOP in the "real world"
Message-ID: <1991Jun27.194748.732@bellcore.bellcore.com>
Date: 27 Jun 91 19:47:48 GMT
References: <1991Jun27.165340.22545@den.mmc.com>
Sender: usenet@bellcore.bellcore.com (Poster of News)
Organization: Bellcore
Lines: 62
To: randy@tigercat.den.mmc.com (Randy Stafford)
Cc: bytor duncan paul

I decided to take this off line until we clear this up. If you think
things are clear, then post this back to the net.

I find calculating metrics on Smalltalk code a little troublesome.
In C and other prodcedural languages, and to a lesser extent C++,
counting statements does make some sense. But counting statements
in Smalltalk is dubious at best. You tend to see these very large
cascades of expression that contain a lot of functionality. So,
when it came time to count Smalltalk methods I used the following
rules:
1) Don't count blank lines.
2) Don't count lines with just comments.
3) Count remaining newlines in the source of the method as LOCs.
I hate counting newlines for metrics for any reason, but right now
I'll live with these rules for Smalltalk. Please note, however,
that this is not your definition of SLOC.

Looking at the Smalltalk/V image and a couple of medium (100+class)
applications indicated averages around 3 lines of code per method.
I didn't count bytes but I would venture a guess that that lines
were around 30-40 bytes as you suggested.

Metrics for C++ are quite a bit easier. I count executable statements,
in particular all statements as defined in the ARM grammar except
labeled-statements and compound-statements. Metrics for the NIH libraries,
several publically availible systems as well as a couple of production
systems had averages in the three to five statement per method range.
Also, the more 'object-oriented' the system is the lower the average will
be. If you triple these stmt-per-method numbers for C++ it provides
a fair approximation of the number of raw lines of source code.

The tripling I suggested was for the 42K LOC number. It may (or may not)
provide a rough indicator for the number of actual lines (newlines/SLOC)
in the source code of the method. Doubling may be more accurate
as suggested by your 85K number. So, maybe we just misundertood
each other's definition of LOC.

I like the idea of trying to estimate the LOC per method based on
the image size and method count, but I don't do metrics full-time
so checking this out will have to wait. I don't know if it will
work, but I don't think anybody else does either.

My main point is that the number of executable statments per method
in an object-oriented C++ system will be very low, especially if the
Law of Demeter is adhered to. My assertion is that 'very low' will be
less than 4 statements for most systems. The number of statements per
method in a C system are typically greater then 10 statements per
function for the systems I have looked at. I think this difference in
statement-per-function/method is significant and will have very great
impact on maintenance.

So, it appears that a better estimate for the SLOC (where SLOC is the
number of actual physical lines of raw source code) of the Analyst is
around 85-100KSLOC. This assumes, of course, that you, Dr. Love and I
are defining SLOC in the same manner. (I still assert that there is no
definition of SLOC that would yield 350KSLOC for that system, which
is the reason I posted in the first place.)

I hope this clears things up for now.

Ross Huitt
bytor@ctt.bellcore.com