Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!rice!sun-spots-request From: knutson@perseus.sw.MCC.COM (Jim Knutson) Newsgroups: comp.sys.sun Subject: Sparcstation time keeping woes Keywords: SunOS Message-ID: <1916@brazos.Rice.edu> Date: 2 Oct 89 15:49:30 GMT Sender: root@rice.edu Organization: Sun-Spots Lines: 80 Approved: Sun-Spots@rice.edu X-Sun-Spots-Digest: Volume 8, Issue 151, message 13 of 15 | Included Message: X-Date: Sat, 30 Sep 89 17:13:32 EDT X-From: Dennis Ferguson I have done battle trying to run NTP on a number of Sparcstation I's. This has turned out not to be possible to do in a reasonable manner. The related observations are: (1) The clocks on sparcstation I's run obscenely fast, somewhere between 300 and 350 ppm (i.e. a drift value of somewhere between -1.2 and -1.4). This is too far off for NTP to capture well, since even if NTP steps the clock to on time it will have drifted 200 ms or so before sys.hold expires, which is outside the loop filter's aperture, and the clock will have to be stepped again. This can be repaired, however, by estimating the drift by hand and initializing the drift file to this value. (2) With an error this large the clock should be gaining 25 or 30 seconds per day if left to itself. The clock doesn't gain anywhere near this much, however. In fact, the ones we have run a second or two per day off, some fast and some slow. This would indicate that something must be setting the clock back from time to time. It is indeed the case that something in the kernel is making the clock jump around. The ntp daemon will hold the time just fine for a while, and then all of a sudden the time will change underneath it by increments which are usually less than a second, but sometimes more than two seconds. This is what I see. The much of the rest is utter speculation since we have no source for this operating system. (1) Sun has produced a machine with a clock interrupt timer which is incapable of producing interrupts which are an exact integral fraction of a second given the frequency of the oscillator driving it. This is unfortunate since the value of hz is an integer. On the sparcstation hz is defined as 100, but should probably be something more like 100.03. (2) The value of tick, the number of microseconds added to the time on each clock interrupt, is computed as 1000000/hz. This makes the clock run fast. (3) There is a battery backed up time-of-year clock in the sparcstation which has a precision of about a second or so. The time-of-year clock is reset when you call settimeofday(), but nothing is done to it when you call adjtime() since the precision is too crude (I have no idea whether this is true or not. It is a guess at how one might produce the symptoms that are seen). (4) Note that the clock was made to run fast in (2), and if you sell machines with clocks which gain half a minute a day some people will probably complain (??). What should have been done to repair this is to fix (2), by setting tick to a value which reflects the actual interrupt interval of the timer (my guess is about 9997). This would have allowed them to get the clock speed to within about 50 ppm without further complication, and this is adequately respectable by Sun standards. There is no reason that tick has to be 1000000/hz that I can see. (5) Unfortunately, (4) was too easy. Instead I suspect that some bright light discovered that, while the interrupt timer might gain half a minute a day, the time-of-year clock was good for a couple of seconds a day, so all you had to do was keep the system time in line with the time-of-year clock and everything would be fine. Of course the time-of-year clock is crude, but who worries about the stuff below a second anyway? Just keep comparing the time until the truncated system clock value exceeds the time of year clock by more than a second or so, then step it back. Note there is a truncation involved in this comparison. This may explain why the steps backwards are sort of randomly sized. This corresponds what I see. Something in the kernel on sparcstations keeps stepping the clock backwards. This insistance that the time-of-year clock is more accurate than time keeping software (if that's what it is) makes trying to synchronize these things futile. This, of course, would also break timed, which may or may not be why Sun doesn't ship it any more. And it makes the adjtime() call nearly useless. I can fix the value of tick so that the system clock keeps better time by myself. What I haven't been able to do is figure out how to turn off whatever it is in the kernel which keeps bumping the clock. I would be very grateful if someone could tell me how to do this on a binary system. Dennis