Path: utzoo!attcan!uunet!samsung!munnari.oz.au!metro!news From: jimr@metro (Jim Richardson) Newsgroups: comp.sys.apollo Subject: Re: Problems with email Message-ID: <1990Jul9.085515.25351@metro.ucc.su.OZ.AU> Date: 9 Jul 90 08:55:15 GMT References: <1664@tuvie> Sender: news@metro.ucc.su.OZ.AU (news) Organization: Dept of Pure Mathematics, University of Sydney Lines: 90 In article <1664@tuvie>, mike@tuvie (Inst.f.Techn.Informatik) writes: > Our mail works OK as long as the registry is available, but > when the registry is down (we do not have slave registries), then > /bsd4.3/bin/mail will not deliver mail to the recipients. Now the > problem seems to be that the mailer cannot acquire the gid of mail, > but about this I'm not too sure. The mailer does not seem to return > an error code (or does /usr/lib/sendmail ignore it ?), whenever > this happens. The log file contains the status report Stat=Sent, but > the mail is nowhere to be found (except on /dev/null). > Has anybody had a similar problem and if so, how was it solved > (without resorting to slave registries)? This does happen to us when the registry dies completely or gets that kind of registry disease where the password file appears to be empty. Another cause is when /usr/spool/mail is unavailable: for historical reasons involving backup /usr/spool/mail on our mail gateway machine is a soft link to a directory on another node (avoid this if you can!). We run a script like the following continuously on the gateway node. It detects either of these problems and kills the sendmail daemon if they occur. If you call it "netcheck" you can start it as root via "/etc/server -p netcheck &". It chews up some CPU time but it's worth it for the peace of mind! #! /bin/ksh # # Check Apollo system is fit to receive incoming mail messages exec >> /sys/node_data/system_logs/netcheck.log 2>&1 print "netcheck starting at $( /bin/date )" SLEEP_TIME=60 REPORT_INTERVAL=60 STOPPED_FLAG="/usr/spool/mail/STOPPED_FLAG" shutsm() { /bin/ps aux print "$( /bin/date ) stopping sendmail daemon: $*" pids="$( /bin/ps ax | /bin/awk '/\/usr\/lib\/sendmail -bd -q[0-9]*m$/ {print $1}' )" print "Killing $pids" /bin/kill $pids /usr/ucb/logger -t netcheck "sendmail daemon(s) $pids stopped: $*" /usr/bin/touch ${STOPPED_FLAG} /bin/ps aux } typeset -i count=$REPORT_INTERVAL while : do if [ ! -f ${STOPPED_FLAG} ] then if [ ! -d /usr/spool/mail ] then shutsm "/usr/spool/mail unavailable" fi if [ ! -s /etc/passwd ] then shutsm "/etc/passwd missing or empty" /bin/ls -l /etc/passwd fi fi count=count-1 if [ count -le 0 ] then # log the date periodically /bin/date # force new test next time even if stop flag exists /bin/rm -f ${STOPPED_FLAG} count=$REPORT_INTERVAL fi sleep $SLEEP_TIME done This will only work for you if /etc/passwd appears to have size zero whenever the registry is down. Furthermore, your sendmail daemon needs to look like "/usr/lib/sendmail -bd -q[0-9]*m" to a "ps ax" command: if you use something else like "-q1h -bd" adjust the script accordingly. Finally, the actual script we run does some other local stuff and I've hacked it down to give the above, which may therefore have a flaw or too. But you should get the idea. -- Jim Richardson Department of Pure Mathematics, University of Sydney, NSW 2006, Australia Internet: jimr@maths.su.oz.au ACSNET: jimr@maths.su.oz FAX: +61 2 692 4534