Path: utzoo!attcan!uunet!husc6!rice!sun-spots-request From: eap@bu-it.bu.edu (Eric A. Pearce) Newsgroups: comp.sys.sun Subject: Re: disk sequencer error Message-ID: <8901162027.AA12705@bu-it.BU.EDU> Date: 25 Jan 89 00:51:10 GMT Sender: usenet@rice.edu Organization: Sun-Spots Lines: 78 Approved: Sun-Spots@rice.edu Original-Date: Mon, 16 Jan 89 15:27:10 EST X-Sun-Spots-Digest: Volume 7, Issue 118, message 3 of 11 tomc@dftsrv.gsfc.nasa.gov (Tom Corsetti): >Recently, our Sun 3/260 crashed because of a power outage.... >Well, today, almost a >week later, I shutdown and rebooted, and got the message: > xy0a: read retry (disk sequencer error) -- blk #495, abs blk #495 >Is this a serious disk problem that I should worry about?... dinah@shell.UUCP (Dinah Anderson): >... >I would like to know what the errors mean and under what circumstances >they occur. I would also like to know what we should do about them. I looked up the error in my Xylogics 451 manual: "Disk Sequencer Error - The disk sequencer did not finish its operation within the allowed time. Several factors may cause this problem. - The 451 did not receive the servo clock signal from the the selected disk drive. Check the B cable; if the connection is good, try a different B cable port on the 451. - The 451 is not receiving any read data from the selected drive. Check the B cable. - The Multibus may be preventing the 451 from gaining proper access." The manual entry I quote from above suggests the problem could be with the cabling or the controller itself, but this has not been the case for us. A bad controller usually spews out large numbers of errors with random block numbers over more than one disk. A bad cable will produce random block errors on one drive (since it's unlikely that more than one cable would crap out at a time.) We had drive cable problems on some rack-mounted systems (3/180's and 3/280's). I believe they were caused by repeated flexing of the drive cables by the doors on the back of the cabinets. The older rack setups have several feet of cable that dangle out of the back of the cabinet and move every time you open the door. (The doors have since been removed - I have not seen any cooling problems so far). A bad disk usually will have errors that give sequential block numbers or at least repeat them numerous times. If you only get an occasional disk error, such as one a week, you might be safe to just map or slip the bad spots, but in my experience, any errors that occur with regularity are indicative of future trouble. If you have a Sun hardware contract, I would have them replace it as soon as possible. If they balk at replacing a drive with only a few errors, push them a bit. It *is* possible for systems to run for long periods without disk problems. I would do a full level 0 of the disk as soon as possible. If you act before a crisis, you can have a scheduled downtime for a drive replacement. You would do a level 0 dump and Sun would come in and replace it. This would make the restore much easier, as you would not have to worry about multi-level backups, not to mention the time you would save. I have seen this error on Fujitsu 2351's ("single" Eagle) and 2361's ("double" or "super" Eagle). It was always accompanied by a massive number of disk errors. Our local Sun field service will replace single Eagles as a whole but they replace only parts of double Eagles (in this case the HDA and the servo board). The "Eagle" series of drives seem to be rather sensitive to power fluctuations. the newer Hitachi DK815-10 and NEC D2363 seem to be more tolerant. -e Eric Pearce ARPANET eap@bu-it.bu.edu Boston University Information Technology CSNET eap%bu-it@bu-cs 111 Cummington Street JNET jnet%"ep@buenga" Boston MA 02215 UUCP !harvard!bu-cs!bu-it!eap 617-353-2780 voice 617-353-6260 fax BITNET ep@buenga