Newsgroups: comp.sys.apollo Path: utzoo!utgpu!news-server.csri.toronto.edu!helios.physics.utoronto.ca!alchemy.chem.utoronto.ca!system From: system@aurum.chem.utoronto.ca (System Admin (Mike Peterson)) Subject: Apollo problem list / tirade (U/Toronto) (LONG) Message-ID: <1991Feb21.160623.7881@alchemy.chem.utoronto.ca> Summary: problem list, APRs Sender: system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) Organization: University of Toronto Chemistry Department Date: Thu, 21 Feb 1991 16:06:23 GMT Here is a point form list of all the problems we have found with HP/Apollo hardware/software. The problems have been broken down into several categories, including high priority (which should NEVER have existed since these problems should have been eliminated in product design or alpha testing), and medium priority (which should be resolved IMMEDIATELY by patches). Within each category, the problems are broken down into: A: fixed in the current release. B: will not be fixed, C: outstanding, D: pending (fixed in the next release). HP/Apollo is under the impression that problems H1, H45, H49, M18, M19, M25, M29 and M41 are fixed by SR10.2, and that prob- lems H4, H10, H58, H60, M33, M55, N59 and N69 are fixed by SR10.3; this is most certainly NOT the case. In the following, "Apollo response: none" should be interpreted to mean that there has been no resolution of the problem; if an APR/SRN number is given, HP/Apollo has responded at least to indicate that they have received the problem report. Looking through the list of problems, it is clear that the following environment has been the subject of little or no alpha/beta testing: DSP100x0, Ethernet with TCP/IP, BSD/csh, multiple rlogin/telnet sessions from "dumb" terminals, X Windows. A couple of examples: vi/more fails in DM pads (ever since SR10.2) but was supposed to be fixed in SR10.3, yet the second time I try to edit a file after booting, vi hangs again; 30 seconds after booting SR10.3 on my DN2500 it was obvious the clock was running wild (I could see the minute literally jumping around the dial in 'xclock'), mouse events were occuring randomly and quickly, and all keyboard input was locked out. HOW DO THINGS LIKE THIS GET OUT? Quite frankly, this list is appalling, as is the fact that many of the high priority problems have existed for more than 2 years with no sign of resolution. We don't have many nodes (5 total) but this list has ensured there will be no more HP/Apollo systems in our department, and our problems are well known in both our University (the largest in Canada) and in the Chemistry community. Our use of the systems is for program development, lots of background number crunching, and general stuff like e-mail/news/TeX, supporting ~150 users (a few quite sophisicated, most barely getting by in UNIX). We want identical environments (BSD + C Shell + X11R4 + Motif) on all systems (Apollo and Sun/SGI/IBM) to minimize user confusion. Advantages of the Apollo systems: - network together easily. - BSD user environment relatively complete, though some very useful commands like 'script' and pipelines in C Shell don't work. - compilers on DN10000 with -O do good flow analysis. Disadvantages of the Apollo systems: - TCP/IP+Ethernet+DN10000 = disaster (system hangs on average once a week for the past 2 years). - X Windows server dies every few days on DN10000, every few weeks on M68K nodes, forcing a reboot to recover the display. - only 2 background priorities (1 and 2) are available with renice (2-20 are all the same). - processes are accounted for multiple times. We pay for our systems by charging real money so this is a disaster. - network with foreign systems (via NFS) is flaky at best and NFS software is years behind the rest of the world. - BSD C library calls are incomplete, and there is no /dev/kmem, which has caused much grief in porting "BSD" packages to the Apollo. - C compiler tells the world it is "ANSI" when it's no where near it. - FORTRAN programs using complex data type rarely work properly. - haphazard adherence to IEEE floating point specs, and lack of automatic control over fp assist used when compiling. - can't run background jobs from the display (they are killed when you logout) - why have an Apollo workstation if it won't do work? - can't run vi/more in a DM pad. These has left our 2 DN580 nodes useless for 1 1/2 years, since it takes 5 minutes to login using X. - no X11R4 support. The first 4 problems make it very difficult to run long number crunching jobs on the DN10000, but that is its main function in life! Whoever came up with the "SR10.3 is the best software release ever from Apollo" is either using illegal substances, or never used an Apollo in the real world, or may even be telling the truth (many :-) needed I hope). We see very few things fixed in SR10.3 that were not already patched in SR10.2 (don't get me started on the patch "secrecy" topic, but how many people were told that SR10.3 on a DN2500 could make your node useless? Apollo has known about this for at least a month, yet no official patch will be available before April --- ridiculous). Lest this seem totally negative on Apollo alone, our experiences with other vendors haven't been that wonderful either. It seems that software is trailing hardware by several years from most vendors. SGI doesn't have all of the BSD C library (but they're working on it, and also don't claim to be "BSD") so porting BSD stuff to them is even harder/impossible. Also, X Windows support isn't all there yet. IBM RS/6000 interactive response grinds to a near halt once a big background job is running, but the FORTRAN compiler is very fast (much faster than the DN10000 compiler) and it puts a DN10000 to shame in floating point performance (especially considering the price). Also it is missing C routines, and the BSD FORTRAN library is non-existant, so porting is no joy again. In our environment, the "answer" seems to be: get a Sun server to run BSD-isms like mail/named, SGI for graphics/multiprocessing, IBM for scalar fp. Run X11R4/Motif on everything (except the Sun who have missed the boat but are big enough to build their own - there should be a good market for 3rd parties to sell Motif for Sun :-) ). Although no vendor really wants you buying 3rd party disks, etc., there are many vendors who can hang all manner of external devices on Suns at affordable prices. This is the way most departments I am familiar with are going, and HP/Apollo isn't going to be there (and isn't now). We bought Apollo to try to get BSD with good fp performance - it didn't deliver the former and there are better alternatives for the latter now. End of tirade for this week :-). ----------- cut here (66 lines / page hereafter) ------------ Fixed High Priority High priority problems (these should not exist): ------------------------------------------------ A: Fixed in the current release. -------------------------------- These problems have been corrected by a standard release of the relevant software, and are listed here once (only) before being removed from the list of problems. H9) 'mail' can not create C shells under at least 2 circumstances: the 'set crt=20' command is used in "Mail.rc", to page long messages. '~v' is used to get to vi inside a mail message, then the ':sh' command is used. [Apollo response: fixed in SR10.2; it is not fixed in SR10.2] APR # dd40f, dd410. H28) delays specified in termcap entries are ignored at 38400 baud, resulting in locked and/or scrambled screens on dumb terminals. [Apollo response: fixed in SR10.3] APR # dde12. H43) using '~loginid' as part of a pathname fails. [Apollo response: none] Call # AT000192, APR # 5b543b26. H52) lack of file system security for Domain/OS files on systems in a supposedly "closed" environment. [Apollo response: use 'inprot'; the problem with this is that you must supply a script of what to change and how - if I knew what needed to be changed, and to what ACL's, I would have already done it] APR # dc6fa. H70) after installation of SR10.3.p, our third party backup/restore software is failing with 'absolute load address already occupied'; this leaves us with no backup protection at all. [Apollo response: none] Call # AT021278, APR #. T9) neither rlogin nor telnet works reliably on SR10.2.p, and leaving a background job from a telnet session blocks that (and all higher num- bered) pseudo-port. [Apollo response: sent a patch tape with new /etc/telnetd, /etc/rlogind and /lib/streams; this problem will be cancelled when an official version of Domain/OS that corrects the problem is released] Call # A2019739, APR # 5b549f0e. - 2 - Fixed High Priority T12) booting SR10.2.p fails disastrously (hangs at or near the 'hostid' command in '/etc/rc.local') if the node has crashed rather than being shut down under operator control. [Apollo response: the problems have been tracked down with Mark Richmond's help to be problems with /etc/hosts and trying to access a name server before TCP/IP is up on the node being booted - even though we are using a name server, the /etc/hosts file is required to have at least some fully qualified host names for lookup at boot time; even with the proper /etc/hosts file, the system still won't boot after a crash without going into the Phase II shell, editting /etc/rc.local to avoid using the name server, disabling all the /etc/daemons files, booting, editting /etc/rc.local, re-enabling all the /etc/daemons files, shutting down, and rebooting -- this is ridiculous; this problem may be avoided by moving the '/etc/hostid' command to the end of '/etc/rc.local'; this problem will be cancelled when an official version of Domain/OS that corrects this problem is released] Call # A2019739, A2020363, A2020661, A2020752, A2028455, APR # 5b5422e7, 5b548283. T13) DSP10000 crashed with "stop CPUs with NMI..." more than 20 times; no dump tape since system unable to do 'du s' (either just hangs, or says "Error: SYSBOOT not found" and "Failure to enter minimum mode". [Apollo response: appears to be patched by SR10.2.0.4.p from the 9005 patch tape; this problem will be cancelled when an official version of Domain/OS arrives] Call # AT000392, APR # 5b549e1a. T14) using the DM command 'cp /bin/csh' in the X environment removes all the keyboard modifications made for X. [Apollo response: none; this problem has been patched by one of the 9005 patches and will be cancelled when an official version of Domain/OS arrives] APR # 5b54f328. - 3 - Not to be Fixed High Priority High priority problems (these should not exist): ------------------------------------------------ B: Never to be Fixed. --------------------- H18) the algorithm used by f77 for complex division does not produce the correct answer for substantial ranges of values. [Apollo reponse: will not be fixed, even though it causes programs to abort that should not] APR # ddac1. H32) all Apollo systems either must adhere to IEEE floating point specif- ications by default, or a facility must be provided to force all proces- sors to operate in IEEE mode at all times (ideally at boot time, but an environment variable would be marginally acceptable). This problem causes a program to give 3 quite different answers, including one case where the program aborts, depending on the floating point option selected at compile time. [Apollo response: "product is working within specification", nothing will be done] SRN # J600618298. H53) there is no automatic way to compile/link for the node environment (e.g. co-processor, fpa board) that does not require modification of every manual compile/link command or makefile. The compilers need to AUTOMATICALLY, and without user intervention at each compile or in the 'makefile', determine the proper environment, e.g. by reading an environ- ment variable, which the user may have to set in their .cshrc file. [Apollo response: will not be done or enhancement request, depending on which APR response is used] APR # dc6e9, dcaaa. H56) the node hangs (refuses logins) when the disk is full - the kernel must reserve enough space to allow a 'root' login for the disk to be cleaned up. [Apollo response: will be considered for SR11] APR # dd416. H57) BSD4.3 include files are missing (e.g. ). This hinders or prevents porting of standard BSD public domain programs, which was one of the major reasons for buying Apollo and using BSD! [Apollo response: some BSD include files are not supplied; my response: what is the point having an Apollo system running BSD if BSD is not sup- ported completely!] Call # 267062, APR # dd065, dd0cd. - 4 - Outstanding High Priority High priority problems (these should not exist): ------------------------------------------------ C: Outstanding. --------------- H1) flow control is lost on rlogin sessions from the cisco terminal server when the session is "broken" (to escape back to the cisco). [Apollo response: talking to cisco about it; note: this problem does not exist with Sun/SGI/Stellar implementations of BSD4.3 rlogind] APR # dcca6. H3) "renice" processes to priority 10, 15 or 20 causes all the jobs to share the cpu equally (using Domain/OS priority 16), negating attempts to run shorter background jobs at a higher priority. [Apollo response: use niceness 0, 1 and 2; fixed at SR11] APR # dd297. H4) the 'script' command does not work on SR10.2 or SR10.2.p nodes. Without this, how are logs of system backups to be saved and printed? [Apollo response: fixed in SR10.2; it works sometimes in SR10.2 which may be related to APR # dde40; it does not work in SR10.2.p or SR10.3.p] APR # dca9f, dd415, de1bb. H5) the DSP10020 (SR10.2.p) is "losing" tcp/ip network services about once a week. Domain services are still functioning (you can cd to the disk files or login to other nodes when your home directory is on the affected node). We also saw this with SR10.0.p and SR10.1.p. The node then has to be rebooted (it must be crashed since the 'shut' command on the console does not work when this happens). [Apollo response: Ethernet microcode and upgrading to SR10.2 should have fixed all the AT-bus based systems; moving to SR10.2.p has not made much difference on the DSP10000] Call # 264324, 280349, A2005081, A2005241, A2005804, A2006241, A2019272, AT000914, AT001732, APR # dd314. H10) 'make' does not work using its built-in rules. [Apollo reponse: fixed in SR10.2; my response: it is partially fixed in SR10.2, but the '.f' rule uses '$(F77FLAGS)' while the '.f.o' rule uses both '$(FFLAGS)' (which are set improperly for UNIX) and '$(F77FLAGS)'; in addition the use of '$(F77FLAGS)' is not documented; Apollo says '$(F77FLAGS)' will be used properly and documented "in the next release"; the use of '$(F77FLAGS)' will break virtually all Makefile's that come with external packages, as it is universally understood that the f77 com- piler flags go into '$(FFLAGS)'; Apollo response: fixed in SR10.3; my response: SR10.3 still uses '$(F77FLAGS)' for f77, which is still undocu- mented, and breaks all external software packages, but at least the default rules are now changeable by the system manager] APR # dde42. H26) if our (remote, non-Apollo) name server node is unable to respond, telnet/rlogin sessions do not get past the login stage when Internet numbers are used instead of names. This leaves our entire Apollo network useless. [Apollo response: none] APR # ddcf3. - 5 - Outstanding High Priority H31) it is impossible to logout from the display of an SR10.2 node if one or more background job(s) has been submitted, even with the 'nohup' com- mand that used to work at SR10.1. [Apollo response: use 'crp' onto the same node; after agreeing this is a bug, as of Sept. 6/90, this is now not a problem and will not be fixed] Call # A2010657, APR # dcc31, ddfcd. H33) ld is loading multiple copies of BLOCK DATA routines from libraries, causing NCAR Graphics Library routines to fail with very strange errors. [Apollo response: claim this is not a problem] Call # A2011651, APR # ddff0. H35) the reliability of the MAXTOR 760 MB disks is extremely low - we have had disk "crashes" on Nov. 15 (approx.) ("w1:0"), Feb. 14 ("w1:1"), Feb. 28 ("w1:0") and Mar. 1 ("w1:1"). In addition, on Feb. 14 the first 2 replacement disks were also non-functional, as was the first replace- ment disk on Feb. 28. [Apollo response: confirmed low reliability of the drives]. H36) a simple f77 program using complex arithmetic gives wrong results on the DSP10000. [Apollo response: none] APR # 5b541447, 5b544b51. H39) using 'rgy_admin' to attempt to return the master registry to normal service after a backup hangs both the 'rgy_admin' and the master registry process in CPU loops. [Apollo response: set the master site explicitly] APR # 5b54bf84. H40) using the command 'history | tail' produces no output at SR10.2/SR10.2.p. [Apollo response: "fixed in a future release"] APR # 5b54901f. H42) lpd started at boot time on DSP10000 just discards print files. [Apollo response: patch was made to /usr/lib/libU77.a; this fixes the small test script, but not lpd when started at boot time] Call # A2029344, APR # 5b5479c3. H44) rlogin/telnet into the DSP10000 fail intermittantly (even with "pty" patch). [Apollo response: none] Call # AT000190, APR # 5b549c8a. H45) in SR10.2, commands like 'more' and 'vi' which pop the vt100 emula- tor automatically cause the pad they are run in to hang after a few invo- cations (usually just 2 are needed); commands run in explicit vt100 shells also hang eventually. Both of our DN580 systems are useless due to this problem. [Apollo response: may be related to APR # ddda3; to recover, the process must be killed from another shell and the pty's remade with 'mkdev /dev pty'; a patch to /lib/streams seems to have fixed this problem on the DN2500 but the patch does not fix our DN580's; also the new version of /sys/vtserver in the same patch just hangs forever on all systems (I am using the new /lib/streams and old /sys/vtserver now); this problem is fixed in SR10.3 but will not be fixed in SR10.2] Call # A2005063, A2006598, AT000193, A2039408, APR # dde40, de203, 3cf7d957. - 6 - Outstanding High Priority H46) DSP10000 crashed with status 0012000A (unimplemented instruction); no dump tape since system unable to do 'du s' (either just hangs, or says "Error: SYSBOOT not found" and "Failure to enter minimum mode". [Apollo response: none] Call # A2029861, APR #. H48) f77 is not writing error messages (e.g. read past end-of-file) on the 'stderr' file, so background jobs just stop running after many wasted hours of cpu time with no indication of what happened; the user then resubmits the job, wasting more cpu time. [Apollo response: can not reproduce] APR # 5b54a577, 5b54f2e6. H49) processes can not be killed with either 'kill' or 'kill -9'; they then use 100% of a cpu until they are "blasted" (after "blasting" the system must be rebooted, so this is not a useful alternative). We have also had "unblastable" processes, where the system would not shut down properly. [Apollo response: none] Call # AT002411, A2039255, APR # 5b54c987, A9FF63, SRN # J600619213. H50) yet another simple program using complex arithmetic fails on the DSP10000. [Apollo response: none] APR # 5b54ed04. H51) according to both USENET news and Rob Raymond (HP/Apollo), there is a "psk5" patch that improves X-Windows performance substantially; we have so far been unable to find anyone to send us this patch tape. [Apollo response: finally provided ordering information; PSK5 has been installed, but this run-around should not happen; no one at the local office knows anything about PSKs or patch tapes, and no information ever reaches the end-user about such things, although they are often critical for continued functioning of the system] APR #. H54) f77 does not detect multiple initializations of variables in dif- ferent DATA statements in the same routine. [Apollo response: claim this is not a problem as the program violates the Fortran standard; my response is that since the compiler produces "bad" code for this, the compiler should check for this error the same way it checks for many other violations of the Fortran standard] APR # dcbda. H55) any user can renice any process in the system, and can raise the priority of a process - both of these functions should be available only to root. [Apollo response: claims this is the intended design; this is completely unacceptable as users can then raise/lower kernel processes at will, and can raise their own priorities above critical kernel processes] APR # dcd0b. H58) pipelines fail if the first element can be completed quickly, resulting in lost data (may be related to APR dd9bd). [Apollo response: fixed in SR10.3; this is not fixed in either SR10.3 or SR10.3.p] Call # A2034520, APR # 5b54901f. H59) nodes sometimes hang at the "Preserving editor files" when being rebooted, and can not be rebooted until the ex/vi files in /tmp have been deleted from the Phase II shell. [Apollo response: none] Call # AT021998, APR # 5b54fac2. - 7 - Outstanding High Priority H60) the accounting is reporting bogus cpu time usage, caused by a shell accounting for all its' child processes. [Apollo response: fixed in SR10.3 or "fixed in a future release" depend- ing on the APR response, but will not be fixed in SR10.2; it seems to be fixed in SR10.3 but not in SR10.3.p] Call # AT005931, A2040339, APR # 5b545dd4. H61) the version 10.8 (beta) FORTRAN compiler has 2 nasty bugs at -O. [Apollo response: none; I have gone back to using the 10.7 patch com- piler, but these bugs must be fixed before 10.8 is officially released for the compiler to be useable at U of Toronto] APR #. H62) using xkill with Motif kills mwm. [Apollo response: none] APR #. H63) mwm hangs the display; system can sometimes be "shut" successfully after 15-30 minutes; some X Windows clients continue to display properly, but none will accept keyboard/mouse input. [Apollo response: none] Call # A2038785, APR #. H64) root does not have root access in NFS mounted file systems, even if such access is given in the foreign system's /etc/exports. [Apollo response: none] Call # AT017309, APR # 34644dff, SRN # J600618769. H65) system boot hangs mounting NFS volumes after a crash. [Apollo response: none] Call # AT020968, A2039633, APR # 30ea3207, SRN # J600620393. H66) there is no graceful shutdown possible if the system hangs during reboot; instead we waste 1/2 hour re-salvol'ing disks every time. [Apollo response: none] APR #. H67) SR10.3.p patch p0147 requires patch p0112, but patch p0112 can not be applied to SR10.3.p. [Apollo response: Hotline response is do both; I have done neither in case our system will not boot after installing patch p0112; SRN response says only apply p0147 as p0112 is in SR10.3.p] Call # A2039341, APR #. H68) in SR10.3.p, commands like 'more' and 'vi' which pop the vt100 emu- lator automatically cause the pad they are run in to hang after a few invocations (usually just 2 are needed); commands run in explicit vt100 shells also hang eventually. Both of our DN580 systems are useless due to this problem. [Apollo response: may be related to APR # ddda3; to recover, the process must be killed from another shell and the pty's remade with 'mkdev /dev pty'; this problem was supposed to be fixed in SR10.3 but is not] Call # A2005063, A2006598, AT000193, A2039408, APR # dde40, de203, SRN # J600620930. H69) tcpd hangs after a few hours of use. [Apollo response: none; probably related to Call # 264324, 280349, A2005081, A2005241, A2005804, A2006241, A2019272, AT000914, AT001732, APR # dd314] Call # A2039409, A2039843, APR #. - 8 - Outstanding High Priority H71) Xapollo dies and 'dm' gets stuck in a cpu loop. [Apollo response: none] Call # AT021971, APR #. - 9 - Pending High Priority High priority problems (these should not exist): ------------------------------------------------ D: Fixed in the next release. ----------------------------- These problems will be cancelled when they are fixed by a standard release (not patch tapes / uploaded files) of the relevant software. T6) in f77, a read of an integer with '*' format from a character string either fails or gives the wrong answer. [Apollo reponse: fixed by a patched ftn compiler] APR # ddab9. T7) f77 with '-C' does not check character substrings properly (it always aborts with a subscript error) if either the '-A cpu,3000' or '-A cpu,fpa1' options is used, but works if not cpu-type is specified, on a DN4500 running SR10.1. [Apollo reponse: fixed by a patched ftn compiler] APR # dcee3. T8) in f77 multiplying a variable by another variable that contains 0.0d0 does not result in the product being zero. [Apollo reponse: fixed by a patched ftn compiler] APR # dda78. T10) a program fails to on DSP10020 with divide by zero, and can not be debugged since dde fails (APR # dd97b) and dbx is not supplied. On a m68k node, the program causes a backend failure in the compiler. [Apollo response: the actual problem with the program is corrected by a patch ftn compiler; I have found that using the '-save' option allows the program to run on the DSP10020] Call # 319065, APR # ddcf2. T15) f77 open with "status='unknown'" fails if the file exists but has zero length. [Apollo response: unable to reproduce with latest software versions; this problem has been patched by one of the 9005 patches] APR # 5b54abbb. H20) a f77 program, part of the NCAR package, causes a DN4500/fpa node to crash, and a segmentation violation on a DN580 node. [Apollo reponse: fixed in release 10.8 of ftn] APR # ddb84. H72) after installation of SR10.3, our DN2500 is useless due to a wildly racing system clock which gains from 1 to 10 minutes for every real minute, keyboard typing is often not echoed for several minutes (if at all). Any mouse movement causes random screen pointer movements, menus pop up or get pulled down, text is marked/yanked/replaced at random. [Apollo response: known problem but no patch until April (it is now Janu- ary!); a patch has been supplied by ftp] Call # AT021949, APR # 6cfd4e20, SRN # J600620971. - 10 - Fixed Medium Priority Medium priority problems (unacceptable features of the Apollo systems): ----------------------------------------------------------------------- A: Fixed in the current release. -------------------------------- These problems have been corrected by a standard release of the relevant software, and are listed here once (only) before being removed from the list of problems. M3) any user can accidentally or deliberately shut down a node from the display (or DSP10020 server console terminal). [Apollo response: a lock file will be implemented in SR10.3; note: 2 of our nodes were useless for 6 days due to this "feature"] APR # dcba8. M29) 'rwmt' does not allow the record format to be specified. [Apollo response: fixed in SR10.2; it is not fixed in SR10.2; temp. solu- tion is to use upper case for the record format] APR # dd39a. M44) the 'man' page for 'cc' can not be displayed properly on a dumb ter- minal not using hardware tabs. [Apollo response: fixed in SR10.3] APR # 5b54f115, SRN # J600618801. M47) an "update" install of SR10.2.p fails, leaving the node useless, unless an "unnecessary and not recommended" config/install++ option is used, and this option is not mentioned in the Installation Notes. [Apollo response: informed me to use the "not recommended" option, which did allow the installation to proceed without error; however, a second update installation failed with the same problems even with "checking none" set] Call # A2019604, APR # 5b54194b. M48) the 'man' page for 'ksh' can not be displayed properly on a dumb terminal not using hardware tabs. [Apollo response: "fixed in a future release"] APR # 5b546b3c. M49) loading software onto a DSP10000 from cartridge tape using 'distaa' causes the c-tape drive to be unusable by regular users until the system is rebooted. The c-tape also becomes unusable periodically in normal use ("device busy", "device in use" or "drive does not exist"), which may or may not recover before a reboot. [Apollo response: problem due to directories with single character names and will be "fixed in a future release"] Call # AT001274, A2033543, APR # 5b545860. M51) the m68k patch version of domain_os (patch m0122) hangs the node if X Windows is in use. [Apollo response: none] Call # A2028456, APR #. M64) a simple GPR program fails with a segmentation fault at the "gpr_$terminate" call. [Apollo response: all m68k patch tape versions of 'gprlib' have this problem; fixed in SR10.3; I have gone back to the original SR10.2 version of 'gprlib'] Call # A2036028, APR # 5b54b064. - 11 - Not to be Fixed Medium Priority Medium priority problems (unacceptable features of the Apollo systems): ----------------------------------------------------------------------- B: Never to be Fixed. --------------------- M4) f77 does not detect illegal branches into DO loops; one instance of this had a program run for 40 minutes of CPU time on a DSP10000 (and the program just quit with no result), but only 40 seconds were needed once the program was corrected. [Apollo response: this is very difficult to detect as an Apollo extension to Fortran allows extended DO ranges where the program may leave and return to a DO loop, and will not be fixed] APR # dcbaa, dda74. M5) f77 does not detect illegal uses of DO loop indices as the index of a nested loop, and allows the loop index to be re-assigned inside the loop. [Apollo response: problem verified; enhancement request] APR # dcba5. M21) when 'rm -f ...' is used in a script, the user is prompted for each file deletion if 'alias rm rm -i' is in effect, which causes trouble for scripts that try to clean up after themselves silently. [Apollo response: claim this is not a bug; however, on other Unix systems the '-i' switch is ignored if 'rm' is not running interactively, and that is exactly what should happen] APR # dd172. M26) the node hangs (refuses logins) when the disk is filled - the kernel must reserve enough space to allow a 'root' login for the disk to be cleaned up. [Apollo response: will be considered for SR11] APR # dd416. M42) no UNIX environment is available when the system is booted from car- tridge tape - this is unacceptable for a UNIX workstation, especially in view of the contortions required to boot a DSP10000 from c-tape at SR10.1.p. [Apollo response: will not be done] APR # de076. M52) after using "dmio -off" to remove the DM windows from the X display, the button can not be used to get them back to be able to logout. [Apollo response: will not be fixed] Call # A2023624, APR # 5b54002f. M58) rwmt will not read standard label magnetic tapes that do not contain "3" in a certain volume label field - the tapes we have received from foreign sites do not have the "3", so they can not be read easily. [Apollo response: ANSI standard labels must have a "3" in that volume label position; in my view, a more reasonable approach is to attempt to make use of the label if possible, but apparently this will not be done] APR # dcb78 (?). M59) vt100 windows can not be created if there are no pttys in /dev. The main problem here is that the error messages are very misleading. [Apollo response: will not be fixed (they say the emulator doesn't know why it fails to start); my response is that all system calls obviously should be checked for successful completion rather than continuing on blindly assuming everything worked] APR # dd411. - 12 - Not to be Fixed Medium Priority M60) /bin/sh does not accept the "${param:-default}" syntax, which is standard Unix Bourne shell syntax on Sun, DEC Ultrix and SGI. [Apollo reponse: will not be fixed, will be considered as an enhancement request] APR # dd621, ddca6. M61) f77 fails with a "backend failure" when compiling a routine contain- ing 1 statement 1300 lines long. [Apollo response: exceeds compiler restrictions and will not be fixed; I will agree with not fixing this, but an error message stating that the program is too large should be produced, not just have the compiler take a fatal system fault; see also APR # ddc1b] APR # ddbfb. M62) f77 never completes the compilation (>11 hours on a DSP10020) when compiling a routine containing 1 statement 18000 lines long. [Apollo response: exceeds compiler restrictions and will not be fixed; I will agree with not fixing this, but an error message stating that the program is too large should be produced, not just have the compiler run forever; see also APR # ddbfb] APR # ddc1b. - 13 - Outstanding Medium Priority Medium priority problems (unacceptable features of the Apollo systems): ----------------------------------------------------------------------- C: Outstanding. --------------- M6) raising zero to the zero power gives wrong answers and/or aborts the program depending on the circumstances (node type, integer/real base to integer/real power). [Apollo response: claim 0^0 is 0; this is incorrect mathematically, as the answer is 1; Apollo has now decided that 1 is the correct answer and will fix some libraries to give the correct answer (M68030, M68040, FPA1 and A88K will be fixed, others will not); in addition the compiler will still not evaluate 0^0 properly; this will be fixed in SR11] APR # dcda0, ddf33. M15) COMMON block names longer than 7 characters are not handled properly if a BLOCK DATA routine is loaded from a library - on the DSP10020, the COMMON block is left as zeroes; on SR10.1.m, the compiler aborts with a segmentation fault. [Apollo response: will be patched at SR11] APR # dcf80. M18) on the DSP10020, the 'statfs' system call returns values that are 4 times to large, or computed in the wrong units. Probably related to APR # dcab6. [Apollo response: fixed in SR10.2; it is not, although 'du' has been kludged to return the correct size] APR # dd29b. M19) there is no mechanism to determine values stored by the kernel (e.g. load average) that are not available via standard system calls - on other Unix systems, this is usually done with '/vmunix' and '/dev/kmem'. [Apollo response: none] APR # dd0b2. M25) the /etc/group file does not contain the group members, making it very difficult to determine which group name a loginid belongs to. [Apollo response: fixed in SR10.2; it is not fixed in SR10.2] APR # dd622. M28) SR10.1.p 'biff' still doesn't work, so users are not notified of new mail that arrives while they are logged in. [Apollo response: none] APR # dd378. M33) rbak does not restore the file/directory inheritance when Berkeley ACL's are used in a directory tree - this must be reset manually. [Apollo response: claim this is fixed in SR10.1; it most certainly is not fixed in SR10.2.p or SR10.3.p] APR # dd97c. M41) no documentation on how to mount the disk of a DN2500 on it's boot partner (for loading software). The mount commands in the DN2500 Owners Guide are complete nonsense. [Apollo response: provided mknod commands ('mknod /dev/wn96a b 0 321' and 'mknod /dev/rwn96a c 0 321') via HOTLINE; the APR response stated that "last minute info did not get into the DN2500 Owner's Guide" - this info should not be last minute since people do run systems without Aegis, although we are apparently in a very, very small minority!] Call # A2004974, APR # dde3f. - 14 - Outstanding Medium Priority M43) there is no file creation command available when the system is booted from cartridge tape - this is required in view of the contortions required to boot a DSP10000 from c-tape at SR10.1.p. [Apollo response: none] APR # de077. M45) only the first DM pad has the proper 'set'/'stty' options set; they are not propagated properly to subsequent pads or 'vt100' shells. [Apollo response: "not a problem" since the environment variables are copied; this is precisely the problem though since for csh, copying the environment variables is not sufficient - either .cshrc must be re-read or the 'set'/'alias' names must be copied too or the current shell must be forked] APR # 5b54b627. M46) BSD calendar fails with 'egrep: regular expression too long'. [Apollo response: known problem, fixed "in a future release"] Call # A2019226, APR # 5b54e354. M50) the C compiler does not converted some unsigned integer values correctly to float/double data types. [Apollo response: none] APR # 5b54056f. M53) documentation for X Windows when X is started at boot time contains many inaccuracies and wrong information. [Apollo response: "not a problem" followed by yet more wrong information on how to start X Windows] Call # A2023624, APR # 5b54e4aa. M55) xterms are created with the 'stty size's set to 0, when they should be set according to the window size. [Apollo response: fixed in SR10.3; it is not fixed in SR10.3 (now xterms get arbitrary sizes)] APR # 5b54f831. M56) 'telnetd' is not passing characters through to the remote program (e.g. 'vi') correctly (it is generating a instead of ). This causes 'vi' and the Cray Remote UNIX Satellite Station software to malfunction on telnet sessions. [Apollo response: "fixed", no release specified; it is certainly not fixed in SR10.2] APR # 5b54a71b. M57) users can not login via local tty lines if the local node can not access a registry site. [Apollo response: the login times out in 60 seconds (as per standard Unix), but timeout for registry lookup is longer than that, so the user gets refused; they also suggest that attempting to login many times may eventually force the local registry to be consulted, which would then allow the login (this was/has not been tried and should not be neces- sary), or to use /com/login (the Aegis login program) in place of /bin/login (this is impossible since we are running BSD Unix only); this forces us to run 'rgyd' on every node where the SIO ports are used] APR # dcb9c. M65) with PSK5, any DM window creation hangs the workstation display and increases the load average by 1 until the window is placed by the user. This wastes 1 cpu on a DN10000, for example. [Apollo response: none] APR # 684ff632, SRN # J600621425. - 15 - Outstanding Medium Priority M66) the mwm man page is messed up by extraneous control characters. [Apollo response: none] APR #. M67) when 'dmwin &' is run from a uwm/mwm root window menu (as any other X client), the pad is created but marked "Pad Closed" immediately. This prevents the creation of easy mechanisms to logout from the display, and general pad creation from an X environment. [Apollo response: none] APR #. M68) when a new account is added with edrgy, sometimes the new user can not create files in their new home directory. All files belonging to the new account must have the userid "fixed" by 'chmod -R' or 'chacl -R', and it must be done at least several hours after the account is first created (or after a 'syncids' on the disk volume). [Apollo response: none] Call # AT016498, APR #. M69) the 'shutdown -r' and 'shutdown -h' commands do not work. [Apollo response: none] APR #. M70) the xterm login window never appears if the mouse or keyboard is touched after logging in to the display. [Apollo response: none] APR #. M71) after a system shutdown, the DN10000 would not reboot, the keyboard was dead, and the front panel led was displaying 'PFbF'. 'Reset' would not allow a reboot; turning off all power allowed a successful reboot. [Apollo response: none; we have now had 3 of these, though I can usually reset/reboot from the front panel] Call # A2039412, A2039634, APR #. M72) many /dev files are not built when SR10.3.p is installed or if 'mkdev /dev all' is run, including /dev/crp*, /dev/rct*, /dev/rmt*, /dev/rwn0a, /dev/rwn96a, /dev/logd, /dev/nuls, /dev/global_devices. [Apollo response: none] Call # AT022654, APR # 7ef151ee, 0291d2c6. - 16 - Pending Medium Priority Medium priority problems (unacceptable features of the Apollo systems): ----------------------------------------------------------------------- D: Fixed in the next release. ----------------------------- These problems will be cancelled when they are fixed by a standard release (not patch tapes / uploaded files) of the relevant software. M16) f77 library routine 'fputc' does not work. [Apollo response: will be fixed in SR10.3] APR # dcf7e. M20) the C compiler has the '__STDC__' preprocessor macro set to 1 indi- cating an ANSI-compliant compiler, yet: the compiler does not allow initialization of "automatic" arrays. the preprocessor directives '#elif', '#error' and '#pragma' are missing. the '#' and '##' options of the '#define' directive are not supported. the include files '', '', '' and '' are missing. [Apollo response: compiler is not ANSI C compliant, but will be in release 6.8; my response: if not compliant, then '__STDC__' should not be defined!] APR # dd0ca, dd0cb, dd0cc. M32) f77 aborts with a "backend failure" when compiling a routine con- taining nested statement functions. [Apollo response: fixed in the next release of ftn] APR # dd702. M35) DSP10020 f77 produces bad code for alternate returns when compiled with '-O'. [Apollo response: ftn patch tape of Mar 21 had fixed this problem; the problem has re-appeared with the 10.7 compiler; fixed again in 'cr1.0' compiler (future release?)] APR # dda75. M36) a program gives an "illegal compatibility mode" error on the DN4500. [Apollo response: may be fixed in 10.7.m (it is not); will be fixed in the next release of ftn; a workaround is to compile the routines separately] APR # dcee5, dda77. M37) f77 fails with "backend failures" when compiling 9 routines using complex*16 from Linpack. [Apollo response: fixed in next release of ftn] APR # dda94. M39) f77 on the DSP10000 fails with a backend failure on a short, simple routine. [Apollo response: add '-iso' compiler option; fixed in next release of ftn after 10.7] APR # ddc20. M63) users whose passwords are made invalid by 'edrgy' are not prompted at their next login to change their password. [Apollo response: fixed in SR11?] Call # AT003642, APR # 5b540f11, SRN # J600594770. - 17 - Fixed Low Priority Non-critical problems (causing aggravation): -------------------------------------------- A: Fixed in the current release. -------------------------------- These problems have been corrected by a standard release of the relevant software, and are listed here once (only) before being removed from the list of problems. N2) unable to set default terminal types for 'telnet'/'rlogin'. [Apollo response: will be done "in a future release"] APR # dc6de. N5) 'rwho' displays incorrect idle times. [Apollo response: problem verified, will be fixed in a "future release"] APR # dc6fd. N64) an environment variable should be provided for the user to specify which "pager" the system should use for 'man' pages (and with what options) - even better would be for this pager to be used in all cir- cumstances where the use of 'more' is currently hard-coded. [Apollo response: added description of $PAGER to man page for man "in a future release"] APR # 5b5474cd. N72) install++ does not preserve the /etc/printcap file. [Apollo response: none] APR # 5b542cc4. N76) /usr/include/string.h contains invalid function prototypes for 'strncpy' and 'strncat'. [Apollo response: fixed at SR10.3] APR # 5b5400cf. - 18 - Not to be Fixed Low Priority Non-critical problems (causing aggravation): -------------------------------------------- B: Never to be Fixed. --------------------- N66) the /usr/ucb/biff command fails on SR10.2.p (changing it from setuid 'daemon' to setuid 'root' lets it work, but is this safe?). [Apollo response: will not be fixed] APR # 5b54c8ad. N74) f77 allows invalid format descriptors like "F10.15". [Apollo response: will not be fixed] APR # ddcd6. - 19 - Outstanding Low Priority Non-critical problems (causing aggravation): -------------------------------------------- C: Outstanding. --------------- N17) the "set filec" option to csh causes the shell to hang [Apollo response: will be fixed in SR10.2/SR10.1.p; it does not hang in SR10.1.p, but does not re-echo the command line after completion was requested at SR10.2] APR # dc8fc, dd412. N23) typing 1024 characters hangs csh. [Apollo response: problem verified] APR # dcb9a. N28) f77 library routine range errors (e.g. sqrt(-1.0)) are reported as "Floating exception" errors, which they are not. [Apollo response: claims this is the "standard Unix error message for all floating point exceptions" and is behaviour close to any other Unix sys- tem; I tried this on a Sun 4/280, and not only is the error specifically a "operand range" message, but the routine returns the IEEE standard value for "Not a Number (NaN)"] APR # dcba6. N29) f77 on the DSP10020 gives false "unnatural alignment" messages for variables that are in fact naturally aligned. [Apollo response: fixed in a future release] APR # dcb75. N33) enabling subscript checking on the DSP10020 causes a "floating exception" when the invalid subscript is found, not a subscript range error, which is misleading. [Apollo response: none] APR # dccc1. N41) a complete set of file and directory ACL's is needed to properly configure the file system, as these are not set correctly or consistently by the installation procedure. [Apollo response: will not be done before SR11] Call # 254175, APR # dcd46. N47) the /etc/nologin file blocks root and %.wheel.% logins. [Apollo response: none] APR #. N49) free-format f77 internal reads (e.g. "read (string,*) i") hang if the character string contains all blanks. [Apollo response: fixed in SR10.2; the program no longer hangs but does not produce the correct answer of zero either] APR # dd33c. N53) telnet sometimes uses the wrong window size on incoming connections, and does not send properly on some outgoing connections. [Apollo reponse: a fix has been applied to an unspecified release] APR # dd623. N54) the Fortran and C manuals incorrectly describe how to combine common blocks and structures, and the example programs in '/domain_examples/cc_examples' are wrong. [Apollo reponse: none] APR # dd70d. - 20 - Outstanding Low Priority N55) the command 'man rbak | head' gives a broken pipe error. [Apollo reponse: none] APR # dd9bd. N56) compiling a file containing 2 main programs causes a fatal compiler error. [Apollo reponse: fixed in next release after 10.7] APR # dda76. N58) the f77 'ioinit' routine will not automatically create files that are used by the program in the same way they are created on other BSD systems, and creates extraneous empty files. [Apollo reponse: will be fixed in some future release after SR10.2] APR # ddab7. N59) /usr/include/math.h is missing definitions for many constants that are defined on Sun/DEC/SGI systems. [Apollo reponse: fixed in SR10.3; this is not fixed in SR10.3] APR # ddb60. N60) using 'iaddr' in a statement function returns nonsense for the address of the argument. [Apollo reponse: may be fixed in some future release] APR # ddc16. N62) f77 does not print out the stack size for routines that exceed the stack size limit, leaving the user guessing what the maximum stack size should be set to. [Apollo response: will be considered for a future release] Call # A2039011, SRN # J600617522. N63) for an upgrade of SR10.1 to SR10.2, install++ does not change the existing hard links for /usr/ucb/apropos and /usr/ucb/whatis (both to 'man') to soft links (which SR10.2 seems to want). [Apollo response: claim this is due to customizations I made at SR10.1; neither of the files was changed from the way SR10.1 installed it, but SR10.2 will not install the new versions properly] APR # dde41. N65) the 'msgs' command pages the messages at 6 lines per screen, where it should be using the available screen size. [Apollo response: "fixed in a future release"] APR # 5b54a277. N67) the /usr/man/whatis file is not updated when SR10.2.p (or SR10.2) is installed. [Apollo response: none] APR # 5b54e53a. N69) no documentation or examples of how to have both the "Alt" and "F0" keys act as X meta keys. [Apollo response: have made some changes to SR10.3 release notes; the notes do not cover the problem adequately] APR # 5b54d3ad. N70) no documentation on the 'seltek' or 'selvt' aliases used in xterms. [Apollo response: problem has been closed by Apollo; there is still no documentation in either Vol. 3 of O'Reilly or in the "Using the X Window System on Apollo Workstations" on how these commands work or how to use them] Call # A2023624, APR # 5b544ed6. N71) biff does not work in an xterm. [Apollo response: none] APR # 5b547f10. - 21 - Outstanding Low Priority N73) lpd writes messages about "/dev/wn0a: No such file or directory" on the system console for many print jobs. [Apollo response: claim this comes from our Imagen filters; I have checked all our source code, and there is no such string] APR # 5b54299c, SRN J600607283. N75) more informative messages than "command not found" should be given when the command is not available due to network outage or node is down. [Apollo response: claim the reason a command can not be found is "lost" along the way; in my view, the reason a command can not be found is extremely important and shells / system calls should not be throwing such information away] APR # dcee6. N77) the default /usr/X11/lib/app-defaults/Mwm file does not provide a root menu for mwm. [Apollo response: none] SRN # J600618157. - 22 - Pending Low Priority Non-critical problems (causing aggravation): -------------------------------------------- D: Fixed in the next release. ----------------------------- These problems will be cancelled when they are fixed by a standard release (not patch tapes / uploaded files) of the relevant software. Yours sincerely, Mike - 23 - -- Mike Peterson, System Administrator, U/Toronto Department of Chemistry E-mail: system@alchemy.chem.utoronto.ca Tel: (416) 978-7094 Fax: (416) 978-8775 Brought to you by Super Global Mega Corp .com