Path: utzoo!utgpu!cunews!bnrgate!brchh104!brchs1!bnr.ca!rice.edu!sun-spots-request
From: tpm@eng.cam.ac.uk (tim marsland)
Newsgroups: comp.sys.sun
Subject: Re: rpc registration and how to tell of a cnode death ?
Keywords: No Digest Subjects in Unmoderated Mode
Message-ID: <3144@brchh104.bnr.ca>
Date: 4 Jun 91 18:40:00 GMT
Sender: news@brchh104.bnr.ca
Organization: Sunspots, Psuedo-Unmoderated
Lines: 174
Approved: sun-spots@rice.edu
X-Original-Date:    Thu, 17 May 1990 04:53:10 GMT

[long, boring, and contains flames.  hit n now..]

>In article <...> Greg Sylvain writes:
>> Q-1) I'm trying to build amd/amq (an automounter daemon), it builds ok, and 
>> to run ok. (I can remount fs's with it with out any ptoblems).

Greg,
        For those that missed the original context, `amd' is a value added
replacement for the automount(8) program (currently shipped with SunOS)
that automatically mounts and unmounts NFS (and other filesystem types) on
demand. `amq' is a program which queries and reports the state of the
`amd' daemon using the same SunRPC mechanism as NFS does.

Disclaimer:  I have the source of amd, (which was recently posted to
comp.sources.unix by its author, Jan-Simon Pendry <jsp@doc.ic.ac.uk> ) but
I have not installed it on HP hardware, so please excuse vagueness..
Hopefully my two cents worth will aid Greg in tracking down the problem.

First off, the SunRPC model has - surprise - clients and servers, and a
set of procedures implemented by the server and called by clients via
Remote Procedure Call stubs.  In this context, `Amd' behaves both as an
NFS nfs/mount server and as an AMQ server. `Amq' is an AMQ client program
that allows you to query the state of `amd.'

>> But amq talks to amd via rpc port number 300019.

300019 is the program number used for amq/amd comms. `Amd' registers that
it will be prepared to listen to `amq' by giving the /etc/portmap process
the tuple [prog, vers, prot, port]. in other words, `amd' tells the
portmapper `i can service remote procedure calls for program number prog,
version vers, using the given protocol (tcp or udp) and the given internet
port number.' clients who wish to use a given service then use the
portmapper to find the appropriate protocol and port number.
See the portmap(1M) entry in the fine manual.

>> The port is supposedly registered, it's
>> in the /etc/rpc file correctly.

i.e. that's not quite `registering' amq, it simply allows routines like
getrpcent(3C) to bind name to program number. the portmapper process
really holds the registration when `amd' runs i.e. it maps [prog, vers] to
internet portnumber so that the client program (amq) can actually open an
internet socket to talk to the right `amd' server port.

>> But whenever I invoke amq, it comes beck with
>> rpc not registered (or something to that effect).  And sure enough, when I 
>> run /usr/etc/rpcifo -p the port isn't registerd. I thought all I had to do was
>>  put a line in /etc/rpc to register the port.  Does anyone have an idea why
>> rpc isn't seeing the new entry in the file ? (I've tried rebooting and nothing
>> changed)

If
	client% /usr/etc/rpcinfo -p server

doesn't show `amq' it is a problem with `amd.' I think that it's because
`amd' isn't registering its AMQ service properly with your portmapper.
Note that the /etc/rpc lookup takes place only where you invoke rpcinfo.

It would be useful to give OS version numbers, and to say exactly where
you are running the amd/amq programs e.g. both on the cluster server or
what?  Have you tried mailing jsp@doc.ic.ac.uk?  I'd be grateful if you
could mail me or post the real answer, whatever it may be.

In article <1720006@hpbbi4.HP.COM> markl@hpbbi4.HP.COM (#Mark Lufkin) writes:

Mark,

** Flame on **

>	I am going to be completely useless at answering your question and
>       (hopefully) make a suggestion that may help you.  First, sorry I don't
>	really know enough about SUN RPC to be able to answer a techncial
>       question on it. What I would like to suggest is the use of NCS
>       (Networking Computing System).

What!!?  Are you suggesting that Greg rewrites `amd' to use NCS??

>       This is available on HP platforms ..

But so is SunRPC.  We use it on the large number of HP machines here for a
variety of purposes.

>       and is fully supported.

Are you saying that you _don't_ support SunRPC?  We have HP manuals that
describe how to use it, so we went ahead and used it.  It is the basis of
your NFS implementation, and surely HP are going to continue to support
NFS for a year or two yet? [You can also get the source of SunRPC for free
from various archive sites, even if you're not an academic institution.]

>       It has also been chosen as the RPC for use
>	in the OSF Distributed Computing Environment (this was announced
>       yesterday and includes a lot more than RPC). Note that SUN RPC was
>	also a contender for use but was not picked for a variety of technical
>       reasons.

Yawn.  Look, I've got nothing against HP/Apollo NCS, and I've heard that
it has some technical improvements over SunRPC.  Great - I have some
tentative thought of my own about SunRPC failings.  However, I've never
heard the particular technical arguments in favour of NCS (despite asking
HP in January), and *would* be grateful if someone would post a brief
account of the differences, or points me at an HP document.  Enquiring
minds want to know.

>       .. Enough of my little speech (I guess I am entitled to it as I
>	support this stuff).

To be brutal, I get the feeling that your posting is simply OSF posturing.
Fine, there are certainly people out there that like this sort of tosh
-- but please don't post it out in the guise of a non-answer to some guy's
question about a vaguely related problem.  If you want to advertise
NCS vs. SunRPC in this forum, *please* tell us why it's better.
``Because OSF says it is'' is not really good enough!

** Flame off **

Greg,

>> Q-2) If your on a cluster server, is there a way you can tell when/how a cnode
>> dies (loses contact with the server) ?  This would seem to be so unusual of a 
>> request.

>       Diskless nodes don't core dump (unless
>       they have local swap) so you will not be able to get more information
>       on why the crash occurred.

That's really neat.  Any reason why?

>       ... As far as the
>	server is concerned it simply that it can no longer communicate with
>	the client.

Quite.  Getting the client to say ``I've died'' is a bit tricky once its
dead :-) Detecting when it dies is a bit more feasible - you can
periodically ping the client workstation to check that it's still
responding.  There are a few ways to do this:

a) Use the ping(1M) program which sends ICMP ECHO packets at the client.
ICMP messages are handled at a low level in the kernel, so ping'ing will
work even when the machine is fairly broken (though not when its actually
dead :) or early on after a reboot. e.g. server% /etc/ping client -n 1

b) By convention, every SunRPC program responds to a ping procedure. So,
as an alternative, you can try ping'ing one of the RPC server processes
running on the client using rpcinfo (in this example, rpc.statd needs to
be running on the client): e.g. server% /usr/etc/rpcinfo -u client status

c) Use the rlb(1M) program (remote loopback diagnostics) which is very
comprehensive, though i've never really used it in anger.

The first two can also be done programmatically, allowing timeouts to be
specified.  In fact I think that (b) is the way `amd' determines if a file
server is alive before automounting a directory from it.

>> 				Greg Sylvain
>> 				Academic Computing Services
>> 				Systems Programmer
>> 			
>> 	UUCP:           	...!{uunet}!umbc5!greg
>> 	Internet (Arpa) :	greg@umbc5.umbc.edu
>>  	BITNET :		GREGS@UMBC
>
>Mark Lufkin
>WG-EMC
>OS Technical Support
>HP GmbH, Boeblingen
>
>These are obviously all my own opinions and don't necessarily reflect
>those of HP etc. etc.

tim marsland,   <tpm@cam.eng.ac.uk>
information engineering division,
cambridge university engineering dept.,