Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!apple!voder!pyramid!athertn!hemlock!mcgregor
From: mcgregor@hemlock.Atherton.COM (Scott McGregor)
Newsgroups: comp.software-eng
Subject: Re: Personal growth and software engineering!
Message-ID: <35017@athertn.Atherton.COM>
Date: 12 Apr 91 03:01:46 GMT
References: <JGAUTIER.91Apr8184934@vangogh.ads.com>
Sender: news@athertn.Atherton.COM
Reply-To: mcgregor@hemlock.Atherton.COM (Scott McGregor)
Organization: Atherton Technology -- Sunnyvale, CA
Lines: 142

In article <JGAUTIER.91Apr8184934@vangogh.ads.com>,
jgautier@vangogh.ads.com (Jorge Gautier) writes:

	960-7300
References: <9233@castle.ed.ac.uk> <1991Mar25.164133.29674@unislc.uucp>
	<JGAUTIER.91Apr3131954@vangogh.ads.com> <549@tivoli.UUCP>
Distribution: comp.software-eng
Date: 8 Apr 91 18:49:34
Lines: 77

In article <549@tivoli.UUCP> alan@tivoli.UUCP (Alan R. Weiss) writes:

> Contrast this with metrics like "we wrote a 1000 line program in three
> months and found 50 defects."  What does this tell you?  Is it good
> or bad?  Should you do anything about your process because of these
> metrics?  Lines of code, hours of programming and number of defects
> are so variable and dependent on so many informal factors that their
> effectiveness is limited. 

My answers to above: a) tells you to inspect your process for flaws,
b) it is bad (defects were known to be present), c) Yes, you should try
to understand what is causing the flaws. For more examples on what specifically
you should do to understand what your process better, and to understand
how this analysis is arrived at, read on (otherwise skip ahead the following
is long--sorry!)

Perhaps simple metrics like 1000lines/3 months, or 50defects/1000 lines
don't tell you much without comparison to similar metrics from other
people.  That may still be an argument for comparative metric usage:
If one process/person generates substantially different results from
the rest of people, it might indicate a fruitful place to do some
study just to understand what is different.  A much lower than average
number of defects/month might mean a poorer defect detection process or
a better defect avoidance process.  A little study might help you determine
which. And knowing that might help you repeat/avoid that cause in the future.
Or it might be that it is unavoidable/uncontrollable in the future, but
at least it would be more predictable.

That said, I think that there is also non comparative information 
in the metrics.  There are some absolute values to compare against.
Fewer defects (dectected + undected) is better than more defects;
hence any defect found is an opportunity for improvement.  The total
number of defects being nonzero does not provide an absolute answer
as to what is wrong, but it suggests some useful experiments or
questions to answer (see below) in order to reduce defects in the future.
Similarly, fewer months is better--time is money, and more time also 
introduces more likelihood of mis-estimation (not on a percentage basis,
but definately on an absolute basis).  Fewer lines is also better.
Again on an absolute level fewer lines means fewer possible errors.  Less
time to test, or inspect in general.

So what are some questions the above metrics suggest we explore?
They tell me a number of things.  First of all, you had 50 defects.
You might want to see if there is anything you can do to your process
to find those 50 defects sooner.  Some defects might have been
side-effects of other defects. Early recognition might have avoided
these.  You might want to see if there is anything you can do to prevent
some of the other defects.  Are some of them defects that lint would
catch?  Can you make lint run before things are checked in each time?  
Would a language sensitive editor catch
some of those things? (maybe force all switch statements to have a
specified default that catches all unexpected values?)  Would inspections
find them sooner? If you ran branch coverage tests would you have 
discovered some defects sooner?  

There was a rate of over  18 defects a month.  On average there are 22 
working days a month. This is almost a defect a day.  Is there something
about your working area that causes people to become distracted and more
error prone? Or is there something in your environment that perhaps
keeps distractions *down* to such a level that you are not getting even more
distractions!

There were 1000 lines. At 66 lines/printed page that is at least 16
pages. At 24 lines per window that is at least 42 window-fulls.
Is there a place that the engineer can put 16 pages of print out so 
all are visible at one time?  If not, does the fact that the engineer
has to "swap" between pages make it difficult to understand the code
and lead to errors (in this respect this IS a valid measure of complexity).
Are things isolated enough that all the relevant pieces can be in a
single window-full, or does swapping here lead to problems too.  Even
if there is a wall or table where you can put 15 printed pages, can an
engineer actually see and understand all of it at once?  Is there a more
compact representation that would have made errors more obvious? Taken less
time to write and debug?

1000 lines / 3 months / 22 days /month = 15.15 lines / day. 15.15 lines / day /
8 work hours/ day =  1.89 lines /hour.   Clearly typing a couple of lines
does not take an hour.  There must be something else going on.  But what?
Are meetings take up a lot of time?  Is training taking up a lot of time?
Does that mean we are asking people to do things that they don't know
how to do? Are people spending a lot of time looking things up in
manuals?  Do they
have the manuals or do they have to walk somewhere to get them.  If they
are on line is access fast?  Are they easy to find things in? easy to
read? easy to understand?  Or is a lot of this "thought" time.  Does this
indicate that the task is complex?  Or just poorly understood?  Do people
need to spend time contacting other people and querying them in order to
better understand things?  Are multiple people developing simultaneously?
Are some of the delays due to coordination problems?  Might there be a
better way to coordinate and speed things?  Or maybe a different design
that isolates
interactions (reduces coordination delays) more? 

In conclusion, I think that the  problem with metrics such as the ones
Jorge gives is not in their derivation, but in ourselves.  Mostly when
people see numbers like this they are annoyed because they don't give
specific answers about what to do to improve your process.  That is
correct.  But they do provide benefit in that they can suggest specific
questions.  Unfortunately, this is where one problem in ourselves lies.
Mostly we don't really want more questions.  Mostly we don't enjoy the
painstaking observation and study necessary to answer these questions.
We might well prefer to get on with fun activities like coding.
We want answers, not more questions.  Since metrics like these raise more
questions, we don't want them.  In fact, we don't want them so strongly,
that we ignore useful information in them that would have to be derived.
1000 lines in 3 months is hard to understand what it means 2 lines / hour
is more understandable, but it takes some work to get from the former to
the latter.  We don't like these metrics puzzlers, so  we don't bother
to do the work to really understand the number. Instead we allow
ourselves to get more numb about these metrics (number and number about
more numbers? :-)
Then we get all hung up on whether it was 1030 statements by counting
semicolons, or 988 counting for expressions as one statement (either one
comes to "about 2 lines /hour" which means this difference doesn't matter).
When metrics become hard to ground in reality (like the size of the US national
debt!) we call them meaningless. And of course this is self-fulfilling
because if we don't investigate they ARE meaningless. 

To make progress from metrics like these is like trying to take collected
wisdom of the alchemists and derive the period table. A lot of people
don't want to do that work.  They want to do the scientific predictive
work that is only possible once the periodic table is derived.  But
we aren't there yet. So computer "scientists" will be frustrated by
such metrics until computer "alchemists" slog through the awful questions,
observations and compilation chores, and stumble on significant parts of
the periodic table of computer programming elements that make prediction
possible.


Scott McGregor
Atherton Technology
mcgregor@atherton.com