Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!mips!apple!voder!pyramid!athertn!hemlock!mcgregor From: mcgregor@hemlock.Atherton.COM (Scott McGregor) Newsgroups: comp.software-eng Subject: Re: Personal growth and software engineering! Message-ID: <35017@athertn.Atherton.COM> Date: 12 Apr 91 03:01:46 GMT References: Sender: news@athertn.Atherton.COM Reply-To: mcgregor@hemlock.Atherton.COM (Scott McGregor) Organization: Atherton Technology -- Sunnyvale, CA Lines: 142 In article , jgautier@vangogh.ads.com (Jorge Gautier) writes: 960-7300 References: <9233@castle.ed.ac.uk> <1991Mar25.164133.29674@unislc.uucp> <549@tivoli.UUCP> Distribution: comp.software-eng Date: 8 Apr 91 18:49:34 Lines: 77 In article <549@tivoli.UUCP> alan@tivoli.UUCP (Alan R. Weiss) writes: > Contrast this with metrics like "we wrote a 1000 line program in three > months and found 50 defects." What does this tell you? Is it good > or bad? Should you do anything about your process because of these > metrics? Lines of code, hours of programming and number of defects > are so variable and dependent on so many informal factors that their > effectiveness is limited. My answers to above: a) tells you to inspect your process for flaws, b) it is bad (defects were known to be present), c) Yes, you should try to understand what is causing the flaws. For more examples on what specifically you should do to understand what your process better, and to understand how this analysis is arrived at, read on (otherwise skip ahead the following is long--sorry!) Perhaps simple metrics like 1000lines/3 months, or 50defects/1000 lines don't tell you much without comparison to similar metrics from other people. That may still be an argument for comparative metric usage: If one process/person generates substantially different results from the rest of people, it might indicate a fruitful place to do some study just to understand what is different. A much lower than average number of defects/month might mean a poorer defect detection process or a better defect avoidance process. A little study might help you determine which. And knowing that might help you repeat/avoid that cause in the future. Or it might be that it is unavoidable/uncontrollable in the future, but at least it would be more predictable. That said, I think that there is also non comparative information in the metrics. There are some absolute values to compare against. Fewer defects (dectected + undected) is better than more defects; hence any defect found is an opportunity for improvement. The total number of defects being nonzero does not provide an absolute answer as to what is wrong, but it suggests some useful experiments or questions to answer (see below) in order to reduce defects in the future. Similarly, fewer months is better--time is money, and more time also introduces more likelihood of mis-estimation (not on a percentage basis, but definately on an absolute basis). Fewer lines is also better. Again on an absolute level fewer lines means fewer possible errors. Less time to test, or inspect in general. So what are some questions the above metrics suggest we explore? They tell me a number of things. First of all, you had 50 defects. You might want to see if there is anything you can do to your process to find those 50 defects sooner. Some defects might have been side-effects of other defects. Early recognition might have avoided these. You might want to see if there is anything you can do to prevent some of the other defects. Are some of them defects that lint would catch? Can you make lint run before things are checked in each time? Would a language sensitive editor catch some of those things? (maybe force all switch statements to have a specified default that catches all unexpected values?) Would inspections find them sooner? If you ran branch coverage tests would you have discovered some defects sooner? There was a rate of over 18 defects a month. On average there are 22 working days a month. This is almost a defect a day. Is there something about your working area that causes people to become distracted and more error prone? Or is there something in your environment that perhaps keeps distractions *down* to such a level that you are not getting even more distractions! There were 1000 lines. At 66 lines/printed page that is at least 16 pages. At 24 lines per window that is at least 42 window-fulls. Is there a place that the engineer can put 16 pages of print out so all are visible at one time? If not, does the fact that the engineer has to "swap" between pages make it difficult to understand the code and lead to errors (in this respect this IS a valid measure of complexity). Are things isolated enough that all the relevant pieces can be in a single window-full, or does swapping here lead to problems too. Even if there is a wall or table where you can put 15 printed pages, can an engineer actually see and understand all of it at once? Is there a more compact representation that would have made errors more obvious? Taken less time to write and debug? 1000 lines / 3 months / 22 days /month = 15.15 lines / day. 15.15 lines / day / 8 work hours/ day = 1.89 lines /hour. Clearly typing a couple of lines does not take an hour. There must be something else going on. But what? Are meetings take up a lot of time? Is training taking up a lot of time? Does that mean we are asking people to do things that they don't know how to do? Are people spending a lot of time looking things up in manuals? Do they have the manuals or do they have to walk somewhere to get them. If they are on line is access fast? Are they easy to find things in? easy to read? easy to understand? Or is a lot of this "thought" time. Does this indicate that the task is complex? Or just poorly understood? Do people need to spend time contacting other people and querying them in order to better understand things? Are multiple people developing simultaneously? Are some of the delays due to coordination problems? Might there be a better way to coordinate and speed things? Or maybe a different design that isolates interactions (reduces coordination delays) more? In conclusion, I think that the problem with metrics such as the ones Jorge gives is not in their derivation, but in ourselves. Mostly when people see numbers like this they are annoyed because they don't give specific answers about what to do to improve your process. That is correct. But they do provide benefit in that they can suggest specific questions. Unfortunately, this is where one problem in ourselves lies. Mostly we don't really want more questions. Mostly we don't enjoy the painstaking observation and study necessary to answer these questions. We might well prefer to get on with fun activities like coding. We want answers, not more questions. Since metrics like these raise more questions, we don't want them. In fact, we don't want them so strongly, that we ignore useful information in them that would have to be derived. 1000 lines in 3 months is hard to understand what it means 2 lines / hour is more understandable, but it takes some work to get from the former to the latter. We don't like these metrics puzzlers, so we don't bother to do the work to really understand the number. Instead we allow ourselves to get more numb about these metrics (number and number about more numbers? :-) Then we get all hung up on whether it was 1030 statements by counting semicolons, or 988 counting for expressions as one statement (either one comes to "about 2 lines /hour" which means this difference doesn't matter). When metrics become hard to ground in reality (like the size of the US national debt!) we call them meaningless. And of course this is self-fulfilling because if we don't investigate they ARE meaningless. To make progress from metrics like these is like trying to take collected wisdom of the alchemists and derive the period table. A lot of people don't want to do that work. They want to do the scientific predictive work that is only possible once the periodic table is derived. But we aren't there yet. So computer "scientists" will be frustrated by such metrics until computer "alchemists" slog through the awful questions, observations and compilation chores, and stumble on significant parts of the periodic table of computer programming elements that make prediction possible. Scott McGregor Atherton Technology mcgregor@atherton.com