Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!cs.utexas.edu!sun-barr!newstop!texsun!cobblers.UK.Sun.COM!adrianc From: adrianc@cobblers.UK.Sun.COM Newsgroups: comp.arch Subject: Re: IBM PC prehistory (parity) Message-ID: <1962@texsun.Central.Sun.COM> Date: 3 Jan 90 15:35:18 GMT Sender: news@texsun.Central.Sun.COM Reply-To: adrianc@cobblers.UK.Sun.COM () Organization: Sun Microsystems UK Lines: 49 References:<1957@crdos1.crd.ge.COM> <73@zds-ux.UUCP> In article , gnb@bby.oz.au (Gregory N. Bond) writes: > In article <121.filbo@gorn.santa-cruz.ca.us> filbo@gorn.santa-cruz.ca.us (Bela Lubkin) writes: > > How is parity handled in larger systems? I know about ECC; are there > any larger systems that use just parity, but attempt to handle it more > reasonably? How do larger systems handle ECC correction failures? > > Well, on Sun 3/50s, it panics and reboots. Hardly a _large_ system, > but get much larger and they tend to have ECC. The latest range of Sun machines (Sun3/80,SPARCstation-1, SPARCsystem-300 series) have synchronous parity reporting and correction. The 3/470 and SPARCserver-490 have ECC. The Sun386i has standard parity. Thats all we have on the pricelist nowadays. Syncronous parity means that the parity error is reported to the CPU at the same time as the memory reference rather than as a high priority interrupt an unknown number of cycles later (as was done in older Sun's). The kernel treats the parity error rather like a page fault. It looks to see if that page exists, unmodified, on disk and if it can it gets the page into a new memory page, remaps the virtual address space and restarts the instruction that caused the parity error. It also writes test patterns to the location to see if it is a hard error or a soft error. If hard then the page is removed from use until the next reboot. An error message warns that a recovery was made. If the page was a modified one then the process that referenced it is killed and the page is checked as before, a message warns that a process has been killed. If the page is part of the kernel then there isn't much you can do so the machine panics after attempting to print an error message. The end result is that synchronous parity is much better than normal parity but not as good (or expensive) as ECC. There is only about 1 Mbyte of memory (where the kernel sits) that will panic the machine if it gets an error and this is independent of whether the machine has 4 Mbytes (entry 3/80) or 224 Mbytes (SPARCserver-390 loaded with 4 Mbit DRAMs one day..). It's a neat trick but there is a slight performance cost in that the 3/80 has an extra wait state. By the way, the ECL SPARC chip has parity on its cache with parity checking on all system data paths and register files in both the integer and floating point units. It uses synchronous parity-error traps to recover from parity errors in unmodified cache locations. Regards Adrian Adrian Cockcroft - adrian.cockcroft@uk.sun.com or adrian.cockcroft@sun.co.uk Sun Microsystems, Merlin Place, Milton Road, Cambridge CB4 4DP, UK Phone +44 223 420421 - Fax +44 223 420257