Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!wuarchive!decwrl!infopiz!athertn!hemlock!mcgregor
From: mcgregor@hemlock.Atherton.COM (Scott McGregor)
Newsgroups: comp.software-eng
Subject: Re: recap so far
Message-ID: <27245@athertn.Atherton.COM>
Date: 17 Jul 90 23:22:03 GMT
References: <27199@athertn.Atherton.COM> <1990Jul10.134226.22459@iti.org> <268462df> <39400111@m.cs.uiuc.edu> <31558@cup.portal.com>
Sender: news@athertn.Atherton.COM
Reply-To: mcgregor@hemlock.Atherton.COM (Scott McGregor)
Organization: Atherton Technology -- Sunnyvale, CA
Lines: 118

Some further comments concerning Cliff's recap.

In general I agree with the facts that Cliff records.  I have some
differing interpretations of what should be deduced from them.

Cliff writes that software engineering prediction models

>... attempt to create an accurate measure of something that is not
> measurable in the first place.

Software development time is surely measurable.  We measure it after
the fact every time we ship a product.  At best what Cliff
is saying is that they 

  attempt to give an accurate PREDICTION of something that is not 
  PREDICTABLE in the first place.  

This is more correct, but truthfully everyone who builds tools for predictions
of this sort is usually quite upfront about the fact that they create
predictions of MEANS with some VARIANCE.  Often, the tools will either show
you the variance, or if you consult the original research data on which
the tool was built you can get the underlying measures of variance.  As
Cliff points out these levels of variance are extremely high.  In that
sense you won't get a very accurate prediction.  However, the say that
the tools attempt to give an accurate prediction is overstating things.
The tools attempt to merely give the MOST accurate prediction possible,
admitting that this prediction is not very dependable.  It may not be
accurate, but it is a better prediction than one might arrive at with
no data.  

Clearly, software schedules are predicted by people, and so ipso facto are
predictable, it is merely that the predictions may not be very
accurate.  

> Such estimates are bound to be useless because numerous coding difficulty
> assumptions will be wrong without support of actual time measurements.
> These errors propogate in an exponential manner throughout layers of code.

Cliff correctly identifies a chief source of the lack of accuracy. 
Software, like other chaotic dynamics processes, seems sensitively dependent
on such specific initial starting conditions that it is inherently
impossible to predict future states with certainty.  However, his 
characterization of such estimates as useless is an overstatement.
In the game of blackjack (21), the odds can be calculated concerning
successful draws if you hand contains 13 points, 15 points, 19 points, etc.
When you know the odds, you do not know accurately what the outcome of
the next draw will be.  But knowing (and  playing) the odds can be better
than ignoring them in the long run.  The odds aren't useless merely because
they don't offer certainty for each draw.  Card counters, can do even better
because they have more accurately odds estimates, but they don't have better
certainty over the next particular draw.  So there is even value to more
accurate estimates, even when you still have lots of room for error.

> It is NEVER right to design a software system ONLY on paper for the purpose
> of devising an estimate, unless you can be happy with an estimate that
> may be way off.

The point is that some people can be MORE happy with an estimate that is
LESS way off, even if it still is far from perfect.  The alternative,
NO estimate, is often psychological unacceptable to risk averse persons.
They might not like the amount of variance in the current estimate;
they might want more certainty, but some estimate is better than none.

> Lets discuss hardware estimates.
> Do such estimates allot a fixed amount of time for each chip to 
> to estimate a board design (eg 80386 = 2 weeks, etc.)

I have created and maintained support systems for HW designers.  The answer,
is that in a manner of speaking they DO make estimates based on numbers of
elements.  IC designers frequently composed their designs by putting together
numbers of pre-defined elements:  so many gate-arrays of a certain sort,
some I/O buffers, so many registers, an arithmetic unit that does such and
such.  Board level designers would then say so many RAMs, a such and such
CPU, this sort of I/O processor, a memory manager, floating point chip
etc.  System designers would design with thus and so bus, a motherboard,
various I/O and memory boards, etc.

First estimates for how long these projects would take were often derived
merely from the total  numbers of elements to be used  at any given level.

Hardware design estimates, especially early ones, were often inaccurate.
But, just as software estimates, they improved as more of the project
was completed.   However, I believe that the hardware designers had an
advantage, and that this advantage yielded some important reduction in
variances.  In general, the amount of a complexity on a chip is roughly
about the same as the amount of complexity of another chip of the same
type in the same era.  Similarly with boards and systems.  Also 
interestingly, the amount of complexity on a board is typically not much
more complex  than the underlying chip if you treat each component on
the board as a black box the way you might treat a gate on a chip
as a black  box.    And it is not easy to visually confuse a chip,
a board and a system.

Software components on the other hand tend to  vary greatly in internal
complexity, in ways that are not at all apparent from their external
interfaces.  Thus if you are making estimates about complexity from
the external interfaces (or requirements specs) you don't have the
same level of strong relationship with the internals that you do
with a chip or board or system.   So you are unable to reduce
variance as much.   For this reason, many estimation tools such as
COCOMO allow you to add additional data about "expected complexity" of
the module to be designed.  But these are less precise relationships 
than the physical constraint relationships of chips and board dimensions.
I believe that it is the power of constraint relationships to predict
complexity that accounts for why despite everything lines of code has
usually the strongest relationship to development time of any typically
measured predictor variable.

Scott McGregor
mcgregor@atherton.com