Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!tut.cis.ohio-state.edu!uc!cs.umn.edu!dmshq!com50!pai!erc From: erc@pai.UUCP (Eric Johnson) Newsgroups: comp.unix.sysv386 Subject: SCO OpenDesktop Crashing With Weird Disk Problems Keywords: ESDI Adaptec SCO ODT 1.0 HELP! Message-ID: <1420@pai.UUCP> Date: 6 Sep 90 15:31:45 GMT Organization: Boulware Technologies, Inc., Burnsville, MN Lines: 208 Help! I've been having some terrible problems with SCO's OpenDesktop 1.0. I'm not sure if these are hardware, software or both. And, I'd appreciate any help from the net. (Please note that I really don't blame anyone but myself and that any and all help is requested. Thanks.) My system: SCO ODT 1.0, X11, Motif, DOS, TCP/IP, Software Dev. Avex 386 mainboard 25 MHz Adaptec 2322-16 ESDI disk controller Paradise VGA Plus 800x600x16 Western Digital 8003EBT Ethernet (thin, and the only system on my own net) Imprimis Wren 6 320 MB disk 8 MB RAM Phoenix BIOS Logitech serial Mouse (latest rev) Relaxed security defaults I normally run the X Window system and use the box for developing programs and writing for my next book. My default config is two large xterms and one xclock, under the Motif window manager, mwm. 1) I cannot seem to be able to run the system with "heavy" use for more than four hours. (I'm developing Motif programs). During a major make session, running the C compiler (stock cc), I'll see a message like "Killed." or "Signal receieved" (I'm not typing ANYTHING at all during this time.) Then, the X server usually freezes and the only thing I can do is Alt-Sys Req to trash the X server (and my compile processes). When I get back to the console (I sure wish xterm -C worked, so I could see console messages under X!), the screen is filled with hard disk errors. These errors keep getting worse, and generally I have to hit the hard reset button. Now, this is a brand new system, but I never rule out hardware (e.g., disk) problems. These disk errors are continuous and all the system seems to be doing is printing these errors to the screen. When I reboot, though, fsck seems to fix all the disk problems. So, the hard disk bad track errors don't seem to me to really be bad tracks, unless fsck isn't really fixing the situation. fsck has always been voodoo to me, but it has always seemed to do the job on the many versions of UNIX I've used. I'm using an Adaptec ESDI controller and an Imprimis Wren 320 MB disk. Any ideas as to what is causing this? Is it probably hardware, or could it be in the software, too? 2) (Related to #1, above): A Motif program I wrote, which normally works fine (its just a test of the Scale widget and it works fine on a number of UNIX workstations), all of a sudden was killed, like above. The disk then went berserk, so I did the infamous Alt-SysReq to trash the X server. At the console, I again saw streams of disk errors. When the system rebooted, and I tried to run my test program, it didn't run. Instead, it looked like it ran dfspace (a df variant that SCO uses). Now, whenever I start an xterm, OpenDesktop (ODT) seems to run dfspace in the new window, so I suspect this is in the system-wide .login or some file like that. Anyway, my executable did something other than it had ever done. Anyone ever seen anything like this? I deleted the file, since I didn't like what happened. 3) One of the times the #1 stuff happened, the ttys data base (part of system security) got trashed, so only the superuser (root) could log in. An SCO Tech support person led me through the process of pulling in a ttys file from the distribution floppy (the ODT manual has a great section on fixing this problem, but it assumes that at least one ttys* file exists, which I didn't have). Note: the SCO Tech Support folks are great (once you actually get to talk to them). I've called them a number of times and they've always helped out with very good advice. The main problem here wasn't the lost file, but: a) A trouble-shooting section in the manual that dealt with the problem but made too many assumptions to be actually workable. b) The implications of ever using a product that has so many weird ("weird" as in not in other versions of UNIX I've seen) files which are required to use the system. This has bad implications for my employer adopting this product (see below). 4) One of the times the above (#1) stuff happened, one of the C compiler executables got deleted (/lib/386/p2_286). So, to recover, I ran the custom program to pull that file in from the distribution floppy diskette. I had two main problems with this: a) Every time I try to install one file, custom brings over the file just fine (I think), but then custom always dies with an "Internal Error: 10#". In errno.h, error 10 is related to calling wait on a child that doesn't exist, I think. What exactly is custom doing that causes it to die so ungracefully? Can anyone bring over single files from the ODT dsitribution disks using custom? b) Once I had the infamous /lib/386/p2_386 file, I still could not compile anything. Why? Because the /lib/386/p2_386 program wasn't "serialized" (a part of SCO's copy protection scheme). Now, how can I "serialize" one single file? Remember that custom dies for me every time I try to install single files, so I never get to the serialization phase from custom (like I did when I first installed this stuff). I tried RTFM-ing, but I didn't find any mention of how to serialize one file. Anyone know how? Even if I my main disk problem is hardware- related, this is a serious issue. I don't really mind SCO's copy-protection scheme (which is also very much like Interactive's), but, a copy protection scheme should be aimed at preventing unauthorized users, not AUTHORIZED users! When copy protection schemes get in my way, I tend to drop the products. So SCO (and Interactive, too, since you have a CP scheme as well), listen up: All this (above) is for my own private system, but during the day I work in R&D at Boulware Technologies (see signature below) and BTI provides industrial automation systems. We expect things like files to get trashed out in the field. We also demand the ability to recover from things like this. This last week, I was asked to evaluate 386 UNIXes for BTI. (A 386 running UNIX is generally cheaper than a full-blown UNIX workstation, especially since BTI puts together their own 386 clones.) I had to state that I did not think that ANY 386 UNIX has evolved to an acceptable level yet. That is, installation is too hard and fraught with problems (it only took me 11 full tries to get SCO ODT installed; I've given up for now on ISC 2.2), system administration is also too hard and especially for SCO fraught with all sorts of security-related issues, and I generally don't have confidence that these versions of UNIX will run under demanding conditions in the field (with users who aren't very UNIX-literate). In other words, I feel that Hewlett-Packard and Sun (for example) have a much stronger software product than either SCO or Interactive and that I do not have the necessary confidence in SCO or Interactive to recommend their products yet. I do not mean for this to be a bitch session, so please don't take it as such. And yes, I do understand that 386 UNIXes must support a vast array of not-so-compatible hardware options, so there are more problems to face on a 386. I want you SCO and Interactive folks to take this constructively. I'd love if your products improved (and yes, I have seen them improve thus far). I'd love to have the confidence in your products, because that would mean a substantial cost savings for my employer. But, I just don't feel the products are there yet. I finally had to re-install the ODT basic software development package to be able to return to a state where I could compile C files. yech-o. 5) How does one change to single-user mode without changing your system forever? I always try to run custom in single-user mode, so instead of bringing the whole system down and then re-booting (due to time, as I was on the phone to SCO Tech Support at the time), I tried: shutdown -iS -g0 -y That is, shutdown to run-state S (single user), right now (-g0) and yes (-y) I want to do it. After doing this, the system console changed from /dev/tty01 to /dev/syscon (which meant I had to change my X start-up scripts in .login), and the root user is always asked: TERM = (ansi) This never happened before I ran that one shutdown. What has really happened to my system and why did it change forever from one shutdown? I've always been used to the idea that shutting down to single-user mode should just do that and not irretrievably change your system when you reboot back up to multi-user run- state. That is, if you boot to single-user run-state, this should be the same as shutting down to single-user run-state. Single-user run-state should be single-user run state. 6) Just about every other time I start the X Window server, I get screen jitter mode. That is, the screen jitters vertically so fast (basically moving every pixel up and down about 1/4 of the height of the screen). This, obviously, makes X totally unusable. Usually, I need to stop X, then logout and then restart the X server. Normally, everything works fine then. Mostly, it goes bad every other time, although somethimes more often and sometimes less often. Any ideas? I'd love to have it work right every time, of course. If anyone has any information on any of these topics, I'd appreciate email (or a post if you're so inclined). I'll summarize the email responses I get for the net. If you suggest RTFM, please point out which manual and which section. I'd love to get this box where I can spend a whole day working on my book and not wasting hours trouble-shooting my system. Thanks, -Eric erc@pai.mn.org -- Eric F. Johnson phone: +1 612 894 0313 BTI: Industrial Boulware Technologies, Inc. fax: +1 612 894 0316 automation systems 415 W. Travelers Trail email: erc@pai.mn.org and services Burnsville, MN 55337 USA