Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!usc!apple!oliveb!amiga!cbmvax!kevin From: kevin@cbmvax.commodore.com (Kevin Klop) Newsgroups: comp.sys.amiga.tech Subject: Re: Parity Checking / ECC RAM on the A3000 Keywords: parity error detection and correction, marketability Message-ID: <11952@cbmvax.commodore.com> Date: 30 May 90 03:59:49 GMT References: <756@bilver.UUCP> <1990May27.101258.24470@zorch.SF-Bay.ORG> <321@tlvx.UUCP> <1990May29.204550.27961@zorch.SF-Bay.ORG> Reply-To: kevin@cbmvax (Kevin Klop) Organization: Commodore, West Chester, PA Lines: 81 Please excuse this disagreement. I'm not a hardware designer, but am making what seem to me to be logical inferences and deductions. If I err, please correct me gently 8^). In article <1990May29.204550.27961@zorch.SF-Bay.ORG> xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) writes: [ Explanation of how alpha particles affect chips omitted in the interests of brevity ] >The problem comes when you accumulate megabytes of these bits together; >the chances of all of them avoiding errors tail off rapidly as their >number increases, in math similar to that the birthday paradox employs. > >I'm a bit shakey on the numbers here, since I was last a hardware >practicioner in 1972 and things have changed a trifle, but to the >best of my understanding, with today's component sizes, speeds, and >numbers of megabytes, you can expect to get in trouble somewhere >between 1 and 100 megabytes. I defer to today's hardware practitioners >for better data. > >As to why you don't see problems in your 3Meg AT, well, for one thing, as >you mentioned, you don't have parity checking actually, AT's _do_ have parity checked ram. I used to run one system with 8 megs of RAM, all of it parity checked (that's why there was 9 chips per bank rather than 8) >, so they could get by. >Next, most of the software you run (or at least what I ran when using a >5 Meg '386 box) is unused by most applications, still stuck at the 640K >limit. True, but that's immaterial to parity checked memory such as what's in an AT - if a memory chip flips then the parity check on that row will reveal a problem, regardless of whether your current application is using that memory or not. And, once you DO start using that memory, any bit flips prior to your usage is immaterial as the first thing that a program should do is initialize memory that it is using, and thus won't know that a bit got flipped, assuming that there's no parity check to have discovered this first. [ stuff about unused memory and/or machines not showing errors ] >But like the birthday paradox, you don't have too far to go in terms of >bigger applications exercising more of the machine, full time unattended >operation (e.g. raytracing, doing accounts), more memory, more critical >applications, and so on, before you run into Seymore Cray's problem. >Parity checking is a necessity in large machines, just to be able to >rely on the results the machine gives you. Error correcting circuitry >is a necessity in large machines, to get the kind of uptime and through- >put the machine's raw speed and memory size seem to promise. I ran an 8 meg AT as a XENIX system that was using all of its memory constantly. In 4 years of operation, I never once got a memory parity error (Although a second AT with a lot less memory seemed to get them regularly - but once I got one, they would show up in droves until I replaced the memory card or chip that was causing me problems). Now, I admit that arguing from a statistical sampling of two machines can hardly be thought of as a valid sampled universe, however it does make me wonder whether the chances are all that great of such errors happening. Yes, ERCC circuitry would make things more reliable, but I wonder if all that many applications truly require this, and if it does, then whether there's a market for add-on memory that does its own ERCC. >Kent, the man from xanth. >(xanthian@zorch.sf-bay.org) -- Kevin -- Kevin Klop {uunet|rutgers|amiga}!cbmvax!kevin Commodore-Amiga, Inc. The number, 111-111-1111 has been changed. The new number is: 134-253-2452-243556-678893-3567875645434-4456789432576-385972 Disclaimer: _I_ don't know what I said, much less my employer.