Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 8/28/84; site lll-crg.ARPA Path: utzoo!watmath!clyde!bonnie!akgua!whuxlm!harpo!decvax!tektronix!hplabs!pesnta!amd!vecpyr!lll-crg!daven From: daven@lll-crg.ARPA (Dave Nelson) Newsgroups: net.math.stat Subject: How robust is S? Message-ID: <625@lll-crg.ARPA> Date: Wed, 5-Jun-85 21:58:23 EDT Article-I.D.: lll-crg.625 Posted: Wed Jun 5 21:58:23 1985 Date-Received: Mon, 10-Jun-85 21:08:17 EDT Distribution: net Organization: Lawrence Livermore Labs, CRG group Lines: 52 *** REPLACE THIS LINE WITH YOUR MISSILE *** I have just begun to look into S for a friend, and have been unpleasantly surprised by its handling of arithmetic. Here's the scenario: I am trying to create a multiple logistic regression facility similar to that available with SAS's LOGIST. The guts of the facility is an MLE computation of the following form: ?fmin(beta, # parameters in beta sum(log(1+exp(X %* beta))) - y %* X %* beta, # -log(L) -t(X) %* (y - 1/(1 + exp(-(X %* beta)))), # grad(-log(L)) -->put your starting vector here<---) # starting vector The data I used is the dataset in the documentation for the SAS LOGIST procedure (SAS Supplemental Library manual p. 83ff.), and I tried to run the the model in step three of SAS's stepwise multlog example. So what happened??? To make a long story short, 1) the behavior of ?fmin is wildly dependent on the value of the parameter gtol: .002 succeeds; .001 aborts with a floating point exception as the norm of the gradient approaches the limit. 2) if I use the results of the SAS run as my STARTING GUESS, ?fmin returns with beta = c(0, 0, 0, 0) after one iteration! (Moral: don't get TOO close with your starting guess??). 3) all the arithmetic seems to be single-precision: when ?fmin succeeds (as it does with the models in regression steps 1 and 2), the results agree only to about 4 significant digits. I expect more. Who's at fault? SAS or S? I suspect S. I could go on and on about the way it loops on a memory allocation error if you interrupt it at the wrong time, then look at certain data vectors, etc., etc. And this is after TWO DAYS of looking into it. Perhaps I am just exceeding its limitations and expecting too much. Could someone verify that I am either: * justly upset at its behavior; * abusing S and deserve what I get; * too naive for my own good and crazy for even CONSIDERING such a problem in an interpreted system; * all of the above; or * none of the above. Cheers, Dave Nelson (daven@lll-crg.arpa)