Path: utzoo!utgpu!watserv1!watmath!att!pacbell!pacbell.com!ucsd!ucbvax!hplabs!hpfcso!hpfcdj!r_carlso From: r_carlso@hpfcdj.HP.COM (Richard_Carlson) Newsgroups: comp.arch Subject: Re: MIPS R[236]000 interrupts (was Workstation Data Integrity) Message-ID: <16870001@hpfcdj.HP.COM> Date: 29 Aug 90 21:23:52 GMT References: <26200@mimsy.umd.edu> Organization: Hewlett Packard -- Fort Collins, CO Lines: 37 > The point is that ignoring a parity error is a pretty safe thing to do; there's > very little chance of getting a misleading answer. Much better than crashing > the computer, which is guaranteed to lose you whatever you had in memory. > > Russell Wallace, Trinity College, Dublin > rwallace@vax1.tcd.ie I used to feel this way until I had an interesting experience with some Apple ][s. I was developing and testing 6502 assembly code on one machine with an emulator; then programming EPROMs on another, remotely- and inconveniently-located, machine. I got the code working on the emulator, burned some EPROMs, and then the program would crash and die. I burned new sets of EPROMs and they had the same problem. The EPROMs verified when programmed; and the programmed EPROMs verified against the data on my disk. Considering some of the hardware differences between emulation and actually running from ROM, such as being able to map different ROM pages into the same memory addresses while the processor was executing out of those addresses, I spent a lot of time looking for software problems in my code. It turns out that the programmer Apple had some stuck-at faults in its RAM. I never suspected that when I was verifying my EPROMs, the data in *system RAM* was corrupt. Although I'm not convinced that crashing on a parity error is the right thing to do, simply ignoring them (or not having parity at all) really can lead to a lot of headaches and hassles. On a related note (that probably doesn't belong on comp.arch, oh well): why in the world does UNIX (Sun's 4.3, in particular) sync the disks after a parity error panic? If you're halting all processing because you can't assume RAM is OK, it seems foolish to write out some of this questionable data to your disk. Otherwise (if you know only one particular RAM location got trashed), why not just send a signal to the process(es) that care about that location and not panic at all? --Richard ...!hplabs!hpfcmb!carlson