Newsgroups: comp.unix.aix Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!uupsi!rodan.acs.syr.edu!grboyce From: grboyce@rodan.acs.syr.edu (George Robert Boyce) Subject: IBM support (sic) story Message-ID: <1991Apr17.195425.8885@rodan.acs.syr.edu> Sender: grboyce@rodan.acs.syr.edu (George Robert Boyce) Organization: Syracuse University, Syracuse, NY Date: Wed, 17 Apr 91 19:54:25 GMT In trying to add a 3rd party scsi disk to my RS6000/530 server (BTW, why does IBM support make us call it a 7013?), I ran into two small problems. The first was that one of the commands, I forget which, forks a copy of "mkboot" and I had my own copy of such a program which was found in my path ahead of /etc/mkboot. My program of the same name, needless to say, didn't do the expected thing and the command seemed to hang. Before I knew this cause of the problem, I had decided that I needed IBM software support since their procedure to add a 3rd party scsi disk seemed to be failing. I was eager to test out IBM's support, and IBM support for 3rd party hardware. That was on Friday morning and I wanted to get this resolved before the weekend. But since I had followed comp.unix.aix and had called IBM software support directly in the past, I knew the procedure was to call my local SE first. I could have told him over the phone that "lcreatevg" was failing, and I could have read or faxed him the error message. But he insisted on coming out to help, on Monday. Fine... So on Monday my SE arrives, we start from scratch and after two or three commands we run into the problem, he records the error message and agrees we should call software support. Level one support wasn't of much help but they did suggest that we reboot the system and see if that helped. It *seemed* like a reasonable suggestion so that is what we did. Enter problem number two... It seems that something (maybe me) had trashed the boot block, err boot logical volume, of the system disk and the system would not reboot. This was obvious to me and it seemed to me that there should be a software solution to this new, more serious, problem. But level one software support, and my SE, said we had a hardware problem. This was despite the fact that I could boot the maintenance disks and mount the system disk and play around without any problems. Ok, so now I get to call hardware support, report the problem, and they dispatch a local HW engineer to deal with the problem. A few hours later, he shows up and we try to run the HW diagnostics. I offered to run them hours earlier, but my SE seemed to insist that we let the HW guy do it. His first question when he arrived was, "So, you run diagnostics yet?". Sigh. Well, the diags run just fine (as I expected) and so he now calls level one hardware support. We all guess their answer and sure enough, they say to reload the system. We say, that is unacceptable and the call gets bumped up to level two hardware support. We play around a couple more hours, including trying to boot diagnostics from the internal disk. We get the same errors as from when we try to boot AIX, which seems to confirm, to me, that the boot logical volume is messed up. It seems to confirm to level two hardware support that we need to reload the system. After insisting that reloading the system was not a valid option, and hardware support insisting that there was no hardware problem, we get the call transfered to level two software support. Once connected, we got the magic commands needed to fix the problem. A third problem came up; I was using an old set of maintenance disks and the instructions didn't work. The level two support person was able to recognize my error, and correct it and the whole procedure took 15 minutes. 15 minutes is a damn good time for any support call and I was very happy. But I am still wondering how to cut down on the *six hours* it takes to get to the right support person. On 4/9/91, Pierre Asselin wrote > General conclusions from earlier exercises: > > o Software Defect Support is officially limited to its narrow mandate. > o Technical support is available for the RISC-6000's. > It's called comp.unix.aix. > o Accurate information on the RISC-6000's is available, but only > on comp.unix.aix. > o Accurate information on the IBM support structure is available, > but only on comp.unix.aix. > o To this day, IBM is convinced that it's doing a fine job. > o Hardware support does work. Beats me. I have to argue that level two software support knows their stuff. The problem then is that IBM has a level one support system in place (a) to protect the valuable and expensive resources of level two by (b) answering the easy questions. I would argue that level one one does half of their job. They do a hack of a job of protecting the level two folks. So that leaves us with comp.unix.aix for level one support, and a good but well protected level two support. It could be worse. I think there are also other possible solutions to this situation. We could try to convince IBM that (a) they have a support problem and (b) that it is a serious problem. That seems like it could be a lot of work and we haven't even solved the problem yet, just made IBM recognize it. My own oppinion is that IBM should subcontract level one support to local and regional support service companies and provide all the necessary support to make it work. But then I've just formed such a company so my oppinion is biased. Regardless, I am calling my local office right now to suggest it... George -- George R. Boyce, Manager, Systems Engineering Group, george@spica.npac.syr.edu CASE: Computer Applications and Software Engineering Center NPAC: Northeast Parallel Architectures Center SCCS: Syracuse Center for Computational Science And now also: The Computing Support Team