Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!olivea!mintaka!bloom-beacon!eru!hagbard!sunic!mcsun!inria!ircam!mf From: mf@ircam.ircam.fr (Michel Fingerhut) Newsgroups: comp.unix.ultrix Subject: Re: Error Logging Requirements Message-ID: <1990Nov12.082526.26969@ircam.ircam.fr> Date: 12 Nov 90 08:25:26 GMT References: <9011061336.AA19738@decpa.pa.dec.com> Organization: IRCAM, Paris (France) Lines: 57 By order of importance (to me): Compatibility with other systems? syslog is a *defacto* standard, esp. in a heterogeneous environment. I'd say (for me) -- 4.3 syslog. The fact that currently ultrix 4.0 supports only 4.2 syslog is a major pain for us. Any other system should be compatible with 4.3 syslog (at the same time extend it, if at all possible). A one-line report is the best place to start with... (but that should not be the only available info). Insofar as LANs, the centralization of reports from different machines in a common log may help resolve problems common to multiple machines which otherwise wouldn't be noticed (e.g., due to security, network, electrical problems, etc...) How is the error log used? An error logging mechanism should (a) alert (b) log info of help to diagnose and repair a problem. It should present both a low-level, nitty-gritty view of the problem, as well as a "high level monitoring" of system's health. Ideally, it should accumulate statistics and make them available to monitoring tools that would allow to see changes in performance over time and alert in such cases (e.g.: disk effective throughput going down over time; load average constantly too high or rising; terminal lines getty's eating too much of cpu time, ECC errors increasing, etc...) How it is reported (alerts) Alerts should be, somewhat like for syslog, either real-time alarm messages to console and/or specific users, or else "trigger" a user-selectable program (e.g., mail, if so selected, to send mail to the system manager; but also other programs which could take site-specific action). The extent of the "report" should be configurable, so that one could configure the same event to be reported differently to different audiences (= classes of users). It should also be possible to do LAN-wide alerts. With the standardisation of X11, it would be nice to have popup alert windows too. What information should the error log contain See above. For "specific bit meaning" -- I think this *should* be included. I.e., an error report of the type IEREG (say) = 0x1234 is useless unless you happen to know the meaning of all bits in all registers of all your devices. Since hopefully the device driver knows it, it might as well elucidate. Error log could be a combination of a short plain ascii text report combined with a detailed (binary) snapshot "somewhere else" with tools to decrypt that info. Michael Fingerhut