Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!usc!apple!mips!bridge2!3comvax!tymix!hobbes!pnelson From: pnelson@hobbes.uucp (Phil Nelson) Newsgroups: comp.sys.amiga.tech Subject: Re: Parity Checking / ECC RAM on the A3000 Summary: parity is not useless, again (long) Message-ID: <3658@tymix.UUCP> Date: 13 Jun 90 03:00:06 GMT References: <1710@lpami.wimsey.bc.ca> Sender: usenet@tymix.UUCP Reply-To: pnelson@hobbes.UUCP (Phil Nelson) Organization: BT Tymnet, Inc. / San Jose, CA Lines: 187 Messages from this account are the responsibility of the sender only, and do not represent the opinion or policy of BT Tymnet, except by coincidence, or when explicitly so stated. In article <1710@lpami.wimsey.bc.ca> lphillips@lpami.wimsey.bc.ca (Larry Phillips) writes: >In <3649@tymix.UUCP>, pnelson@hobbes.uucp (Phil Nelson) writes: >> >> You may want to consider that what you have reading is the opinion of some >>people that memory chips are so reliable that "parity is useless". The facts >>(if we had any) may be otherwise. If the Amiga had parity, it would easy to >>get good data on the reliability of the memory IN THE BOX (not in some chip >>test lab) and IN THE FIELD (not some clean, quiet final test area). > >Since I am the one that used the words "parity is useless", I think I will say >that you should refrain from placing words in my mouth that were never there. I >did _not_ say make that statement because I think that memory is too reliable. >I said it because I see no real use for adding extra memory, at extra cost, >thereby statistically reducing reliability, for the sole purpose of either (a) >informing the user that a partity error has occurred, or (b) crashing the >program or system. If I have misrepresented what you have said, I apologize. That was not my intent. There have been several comments on this matter, and many people (including, I thought, you, Mr. Phillips) have said that modern memory is so reliable that parity is not required. Your comment stuck in my mind. In fact, I had intended to respond to your last post, in which you repeated this assertion. Unfortunately my Amiga has not been well for some time, she crashed while I was writing a reply. A few days later when I got some free time again, your article had expired here. I was going to write that you have not answered my post describing a situation (badly designed expansion memory box) where parity might have been useful, but instead kept repeating the gross overgeneralization "parity is useless". Possibly I was not polite enough in my original post, if not, please consider that statements like "parity is useless" invite intemperate responses, especially from people like me that have been repeatedly burned by the poor quality control, poor design, and insufficient diagnostic capability of many different kinds of personal computers. >>These are good points. I think it very likely that the memory system is not >>the greatest cause of unreliability on the Amiga. Certainly not if you >>include software bugs. This does not prove that parity checking is useless, >>but that other measures are needed too. The order in which to take measures >>to improve reliability is not determined exclusively by which is the worst >>problem, it may be reasonable to start with a problem that is not the worst, >>if a solution is easily implimented (memory parity checking, for example). > >In what way do you see parity checking as 'measures to improve reliability'? >I think you are confusing reliability with some other parameter. Parity >checking, if it only informs you of a parity error, does not change the >reliability of a system at all. If it is used to halt a task or a system, it >does, in fact, reduce reliability. No, I AM NOT CONFUSED! I am irritated, frustrated, discouraged, etc. that practically the whole personal computer industry does not seem to grasp the usefulness of discovering problems, both design and process, as early as is practical. I understand perfectly that adding parity reduces the MTTF of a product. How much depends on a lot of things, including what you do with the parity error information. If all a parity error does is light a LED on the front of the box, the MTTF should not be reduced much. I am not a fan of crashing the machine on parity error, unless I can turn it off. I see parity as "measures to improve reliability" in just the same was as a DVM, a scope, a final test procedure, or any number of other diagnostic tools. Unlike many, it has the virtue of staying with the machine through it's life, providing (for those few manufacturers who are interested) feedback on how the design, parts, etc. REALLY perform in the field. It improves reliability for any user has a memory problem that is not obviously detectable in other ways, by allowing earlier detection and repair. It improves confidence, which is really the same thing, for many people, by reducing the probability of undetected corruption of data. > >You might want to ask yourself what the benefits of parity checking are, vs. >the cost of it. > >Benefits: > > Information. You know you had a memory error, and have the option of >rerunning anything that might possibly have been affected by it. > > Information. You know that after running any particular program, if you were >not informed of a parity error, that any errors you may have, were caused by >something else. Note that the lack of a parity error says nothing about the >accuracy of your results, and that the presence of a parity error likewise says >nothing about the accuracy of your results. > Your 2nd statement is untrue. The presence of parity error detection in memory will certainly increase the confidence in any data contained in that memory. Not as much as ECC, of course, but significantly. Confidence is not absolute, of course. Obviously the fact that I did not have a memory parity error does not guarantee the data, there are many other places where it might get garbled. It does increase confidence, though, by reducing the probability of an undetected memory error. And that most definitely does say something about the accuracy of my results. You have not included confidence in the hardware (in this case the memory), which is my whole point. What you need to understand is that I don't care about maximum confidence in the data. If I wanted more confidence in the data I would be looking for bugs in the software first. I know when the data cannot be trusted - it cannot be trusted when my machine is crashing every few days. Even if the crash itself did not damage data directly, the disorganisation brought on by having to recover from crashes constantly would. I can deal with a little randomization of my data, if I couldn't, I certainly would not be using an Amiga. I bet most other Amiga users can, too. What I and a lot of other actual and potential Amiga users cannot deal with easily is a flaky machine which cannot be easily fixed. What I propose is that we all forget about trying to make each machine perfect, we are obviously not close to that, and concentrate on attaining a resonable level of reliability. I propose the following test: Every Amiga should be able to run at least one month under normal usage without crashing. If it can't, the cause of the problem (hardware or software) must be findable and correctable by a reasonably competent diagnostician within 1 week. My estimate of the hardware/software division to most efficiently aproach the goal is 10/90. For hardware, I would start with memory parity checking, because it is obvious, easy, and quick. For software, I would start testing programs, to accumulate a database of interaction problems. >Costs: > Parts. > > Wasted time/resources. If a parity error occurred in a non-important part of >memory (including the parity bit memory itself), you have no way of knowing >that you didn't need to rerun a program. The mere presence of a parity error >indication tells you nothing but that there was a parity error, but encurages >users to rerun things, and lulls them when the little light doesn't come on. I really doubt that most users think like this. I think most users are going to keep running in spite of the error indication, unless the computer starts crashing. When the machine has crashed for the 5th time in one day, and they are really starting to get frustrated, hopefully they are going to start thinking about what that little blinking red "error" light means. Remembering that most users and many computer dealers have only a vague idea of how to troubleshoot, consider the difference between Joe user calling the computer store saying "my computer is crashing" and "what does it mean when the PERR light keeps blinking?". The latter case is an obvious trip to the shop, the former can be a months long odyssey in software swapping. I can tell you from personal experience that such an odyssey can be extremely irritating, time consuming, and generally likely to cause people to make intemperate overgeneralizations about "flakiness". >> I think the cost of ECC cannot be justified on the Amiga, unless for special >>applications. The added cost of simple parity checking (not very great) might >>easily by justified because it would help by allowing the early detection >>and repair of machines with memory problems. It would be especially useful >>for machines with flaky, intermittent memory. > >The most useful thing for machines with flaky, intermittent memory is a trip to >the repair shop. Flaky, intermittent memory will show up in other ways, without >having to add more flaky, intermittent memory. Possibly you missed my earlier article, where I described the many weeks it took to convince Pacific Cypress that they did in fact have a hardware problem. They built the box, they tested the box, yet they assumed (no, insisted) that the problem was software. To me, it was pretty obvious after playing with my machine for a while that the problem was hardware, to them, it was not. I suppose you could say that they should have known, but we should not be designing machines to work with people as they should be, but as they are. Consider also that I had a serial number around 50, so there were quite a few other people out there having similar problems, yet "no one else has this problem". I certainly do not intend to claim that parity is a panacea, what I do claim is that there is an obvious reliablity problem in this whole PC industry, and that the Amiga is no better than average. Because of the obvious problems, I, for one, will not be convinced to abandon my advocacy of measures to improve the reliability of the Amiga, in particular by adding parity error detection, by the fact that it parity cannot guarantee my data, or by statements like "parity is useless". >| // Larry Phillips | >| \X/ lphillips@lpami.wimsey.bc.ca -or- uunet!van-bc!lpami!lphillips | >| COMPUSERVE: 76703,4322 -or- 76703.4322@compuserve.com | -- Phil Nelson . uunet!pyramid!oliveb!tymix!hobbes!pnelson . Voice:408-922-7508 He who walks with wise men becomes wise, but the companion of fools will suffer harm. -Proverbs 13:20