Path: utzoo!utgpu!cunews!bnrgate!brchh104!brchs1!bnr.ca!rice.edu!sun-spots-request From: tpm@eng.cam.ac.uk (tim marsland) Newsgroups: comp.sys.sun Subject: Re: rpc registration and how to tell of a cnode death ? Keywords: No Digest Subjects in Unmoderated Mode Message-ID: <3144@brchh104.bnr.ca> Date: 4 Jun 91 18:40:00 GMT Sender: news@brchh104.bnr.ca Organization: Sunspots, Psuedo-Unmoderated Lines: 174 Approved: sun-spots@rice.edu X-Original-Date: Thu, 17 May 1990 04:53:10 GMT [long, boring, and contains flames. hit n now..] >In article <...> Greg Sylvain writes: >> Q-1) I'm trying to build amd/amq (an automounter daemon), it builds ok, and >> to run ok. (I can remount fs's with it with out any ptoblems). Greg, For those that missed the original context, `amd' is a value added replacement for the automount(8) program (currently shipped with SunOS) that automatically mounts and unmounts NFS (and other filesystem types) on demand. `amq' is a program which queries and reports the state of the `amd' daemon using the same SunRPC mechanism as NFS does. Disclaimer: I have the source of amd, (which was recently posted to comp.sources.unix by its author, Jan-Simon Pendry ) but I have not installed it on HP hardware, so please excuse vagueness.. Hopefully my two cents worth will aid Greg in tracking down the problem. First off, the SunRPC model has - surprise - clients and servers, and a set of procedures implemented by the server and called by clients via Remote Procedure Call stubs. In this context, `Amd' behaves both as an NFS nfs/mount server and as an AMQ server. `Amq' is an AMQ client program that allows you to query the state of `amd.' >> But amq talks to amd via rpc port number 300019. 300019 is the program number used for amq/amd comms. `Amd' registers that it will be prepared to listen to `amq' by giving the /etc/portmap process the tuple [prog, vers, prot, port]. in other words, `amd' tells the portmapper `i can service remote procedure calls for program number prog, version vers, using the given protocol (tcp or udp) and the given internet port number.' clients who wish to use a given service then use the portmapper to find the appropriate protocol and port number. See the portmap(1M) entry in the fine manual. >> The port is supposedly registered, it's >> in the /etc/rpc file correctly. i.e. that's not quite `registering' amq, it simply allows routines like getrpcent(3C) to bind name to program number. the portmapper process really holds the registration when `amd' runs i.e. it maps [prog, vers] to internet portnumber so that the client program (amq) can actually open an internet socket to talk to the right `amd' server port. >> But whenever I invoke amq, it comes beck with >> rpc not registered (or something to that effect). And sure enough, when I >> run /usr/etc/rpcifo -p the port isn't registerd. I thought all I had to do was >> put a line in /etc/rpc to register the port. Does anyone have an idea why >> rpc isn't seeing the new entry in the file ? (I've tried rebooting and nothing >> changed) If client% /usr/etc/rpcinfo -p server doesn't show `amq' it is a problem with `amd.' I think that it's because `amd' isn't registering its AMQ service properly with your portmapper. Note that the /etc/rpc lookup takes place only where you invoke rpcinfo. It would be useful to give OS version numbers, and to say exactly where you are running the amd/amq programs e.g. both on the cluster server or what? Have you tried mailing jsp@doc.ic.ac.uk? I'd be grateful if you could mail me or post the real answer, whatever it may be. In article <1720006@hpbbi4.HP.COM> markl@hpbbi4.HP.COM (#Mark Lufkin) writes: Mark, ** Flame on ** > I am going to be completely useless at answering your question and > (hopefully) make a suggestion that may help you. First, sorry I don't > really know enough about SUN RPC to be able to answer a techncial > question on it. What I would like to suggest is the use of NCS > (Networking Computing System). What!!? Are you suggesting that Greg rewrites `amd' to use NCS?? > This is available on HP platforms .. But so is SunRPC. We use it on the large number of HP machines here for a variety of purposes. > and is fully supported. Are you saying that you _don't_ support SunRPC? We have HP manuals that describe how to use it, so we went ahead and used it. It is the basis of your NFS implementation, and surely HP are going to continue to support NFS for a year or two yet? [You can also get the source of SunRPC for free from various archive sites, even if you're not an academic institution.] > It has also been chosen as the RPC for use > in the OSF Distributed Computing Environment (this was announced > yesterday and includes a lot more than RPC). Note that SUN RPC was > also a contender for use but was not picked for a variety of technical > reasons. Yawn. Look, I've got nothing against HP/Apollo NCS, and I've heard that it has some technical improvements over SunRPC. Great - I have some tentative thought of my own about SunRPC failings. However, I've never heard the particular technical arguments in favour of NCS (despite asking HP in January), and *would* be grateful if someone would post a brief account of the differences, or points me at an HP document. Enquiring minds want to know. > .. Enough of my little speech (I guess I am entitled to it as I > support this stuff). To be brutal, I get the feeling that your posting is simply OSF posturing. Fine, there are certainly people out there that like this sort of tosh -- but please don't post it out in the guise of a non-answer to some guy's question about a vaguely related problem. If you want to advertise NCS vs. SunRPC in this forum, *please* tell us why it's better. ``Because OSF says it is'' is not really good enough! ** Flame off ** Greg, >> Q-2) If your on a cluster server, is there a way you can tell when/how a cnode >> dies (loses contact with the server) ? This would seem to be so unusual of a >> request. > Diskless nodes don't core dump (unless > they have local swap) so you will not be able to get more information > on why the crash occurred. That's really neat. Any reason why? > ... As far as the > server is concerned it simply that it can no longer communicate with > the client. Quite. Getting the client to say ``I've died'' is a bit tricky once its dead :-) Detecting when it dies is a bit more feasible - you can periodically ping the client workstation to check that it's still responding. There are a few ways to do this: a) Use the ping(1M) program which sends ICMP ECHO packets at the client. ICMP messages are handled at a low level in the kernel, so ping'ing will work even when the machine is fairly broken (though not when its actually dead :) or early on after a reboot. e.g. server% /etc/ping client -n 1 b) By convention, every SunRPC program responds to a ping procedure. So, as an alternative, you can try ping'ing one of the RPC server processes running on the client using rpcinfo (in this example, rpc.statd needs to be running on the client): e.g. server% /usr/etc/rpcinfo -u client status c) Use the rlb(1M) program (remote loopback diagnostics) which is very comprehensive, though i've never really used it in anger. The first two can also be done programmatically, allowing timeouts to be specified. In fact I think that (b) is the way `amd' determines if a file server is alive before automounting a directory from it. >> Greg Sylvain >> Academic Computing Services >> Systems Programmer >> >> UUCP: ...!{uunet}!umbc5!greg >> Internet (Arpa) : greg@umbc5.umbc.edu >> BITNET : GREGS@UMBC > >Mark Lufkin >WG-EMC >OS Technical Support >HP GmbH, Boeblingen > >These are obviously all my own opinions and don't necessarily reflect >those of HP etc. etc. tim marsland, information engineering division, cambridge university engineering dept.,