Xref: utzoo comp.arch:7275 comp.sys.ibm.pc.rt:178 Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!cs.utexas.edu!ut-emx!ibmchs!auschs!sauer From: sauer@auschs.UUCP (Charlie Sauer) Newsgroups: comp.arch,comp.sys.ibm.pc.rt Subject: Why the original RT seemed/was slow (was ...) Message-ID: <1287@auschs.UUCP> Date: 21 Nov 88 00:58:59 GMT References: <5046@polya.Stanford.EDU> Organization: IBM AES, Austin, TX Lines: 86 I had hoped to just sit back and watch this discussion, but... - there seems to be little distinction between the original machine and the upgrades last year and this year - there are some key points which have been insufficiently noticed in the discussion, e.g., the role of optimizing compilers - there have been some significant inaccuracies, e.g., with respect to the implementation and impact of the VRM. (Contrary to one assertion, the VRM is not half assembly code, but mostly PL.8 and C. Though the VMI has negative impact in some applications, in many cases that is more than compensated for by the benefits of the paging code in the VRM and the real time capabilities in the VRM.) So, I'll try to provide some perspective. The main reason, in my opinion, that the original machine seemed/was slow when it appeared in Spring of '86 is that the design decisions were made, for the most part, with the plan that the machine ship in Winter of '84. If we had been able to hold that plan, then it would have seemed much faster. The main limitations in performance in the machine when released were - the compilers, both for AIX and 4.2/RT, had very little optimization capability. Thus they violated a fundamental concept of RISC, that of exploiting the processor with highly optimizing compilers. In AIX 1.1, we began providing global optimizing C and Fortran compilers based on pcc/f77 but incorporating HCR's Portable Code Optimizer. In 87, a C compiler based on the PL.8 compiler (the "Advanced C Compiler") became available as an AIX option. Correspondingly, the Metaware High C compiler became available with 4.2/RT and 4.3/RT. These compilers provided roughly 2 to 1 improvements in performance in many applications. - there was no built in floating point hardware and the optional floating point accelerator was not very fast - the disk controller used a ST-506 interface and had no DMA capability Though the I/O bus was only 16 bits, it had a number of implementation enhancements over the AT bus, e.g., 32 bit burst/buffering extensions, and that bus is not used the main memory bus, so I don't think that bus is a major bottleneck in the machine. It is able to sustain more than 2 megabytes/sec in DMA transfers, and that is adequate for most applications on a machine of that processor speed. The original processor was a 6 MHz NMOS implementation. It had several instructions which were multiple cycle but should have been one cycle, and was not able to pipeline loads and stores with virtual memory enabled. On our standard internal CPU kernel, it comes in at 2.1 MIPS. Last year we started shipping a new processor implementation which reduces the above cited multiple cycle instructions to single cycle instructions, is able to pipeline loads and stores with virtual memory enabled, and has a number of other minor improvements. The 10 MHz CMOS implementation of that chip comes in at 4.5 MIPS on the above cited kernel. Primarily due to memory shortages, we were unable to provide adequate quantities of machines with that processor until early this year. In July we started shipping machines with a 12.5 MHz version of the new implementation. Though these machines are not as fast as some high end workstations, we think they are very competitive in price/performance, as do others. See David Wilson's dollar/Khornerstone ratings in the May Unix Review, for example. Besides reimplementing the processor and providing optimizing compilers, we provided a 20MHz 68881 standard with the 10 MHz CMOS processor and provided an optional floating point accelerator using the ADSP 3210 and 3221. With the 12.5 MHz machines, the standard floating point unit is based on those parts. We also started providing DMA controllers for both ESDI and SCSI disks, with caching on the controller cards. For those that care, RT models 10, 15, 20 and 25 have the 6 MHz NMOS processor. The 115 and 125 have the 10 MHz CMOS processor, and the 130 and 135 have the 12.5 MHz CMOS processor. Those are the machines that we have shipped. I think it is well known that we are working on follow on machines which support the Micro-Channel. Other than that, I don't think much is publicly known about those machines, so I won't say anything more about them now. -- Charlie Sauer IBM AES/ESD, D75/802 uucp: cs.utexas.edu!ibmaus!sauer 11400 Burnet Road 822: @CS.UTEXAS.EDU:sauer@ibmaus.uucp Austin, Texas 78758 aesnet: sauer@auschs (512) 823-3692 vnet: SAUER at AUSVM6