Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!wuarchive!usc!rutgers!rochester!pt.cs.cmu.edu!o.gp.cs.cmu.edu!andrew.cmu.edu!zs01+ From: zs01+@andrew.cmu.edu (Zalman Stern) Newsgroups: comp.arch Subject: Re: IBM RS6000 Message-ID: Date: 12 Jan 91 03:14:47 GMT References: <1991Jan10.214122.9506@news.arc.nasa.gov> Organization: Information Technology Center, Carnegie Mellon, Pittsburgh, PA Lines: 67 In-Reply-To: <1991Jan10.214122.9506@news.arc.nasa.gov> lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: >[IBM RISC System/6000 is very fast on vector code and as fast as other > processors (at equiv. clock) on scalar code.] The hardware has lower FP cycle counts than other processors. (One exception is the MIPS R6000.) One place where the RIOS falls down is that branches take too long. (zero to three cycles with the average on the high side.) There is some room to improve this in the implementation. > [Context switching is rumored to be slow.] > If it bad, why is it? What is it about the design? > > Memory management? The RIOS MMU is an excersise in complexity. The inverted page table (IPT) with hardware reload and hardware lock bit support is too far gone. TLB reload is somewhat slow as a result. One might see performance problems with processes that thrash the TLB. I haven't measured this though and it would only show up for large processes. The IPT also limits how different address spaces can share memory. [See the IPT flamage that has shown up in this newsgroup at least three times already.] This leads to performance tradeoffs for Mach. In practice this isn't a problem and it certainly shouldn't show up in AIX since it only shares 256 megabyte segments between processes anyway. (Segment sharing is efficient on the RIOS hardware.) > > Cache? Shouldn't be a problem. the cache is tagged with 52 bit virtual addresses so there is no need to flush anything on a context switch. the 4 way set associative data cache might improve cache residency across context switches. (That is the next time your process gets scheduled, there is a better chance that some of its data will still be in the cache.) > > O/S bug or feature? Most likely. Wouldn't be the first performance bug in AIX 3.1 :-) One way to test it would be to get context switch times for Mach 2.5 on the RIOS and compare them to the DECstation 5000. > > How could IBM have missed something like this in the design (it should have > been obvious when the first prototype was built...? Doesn't everyone do > big compiles as background jobs?) When I was doing development on a 530 (25 Mhz RIOS) I didn't notice these problems. (My MIPS Magnum feels a little better, but at least part of that is the losing X11 performance on the RIOS.) Of course, a single user workload is not a good test case for context switching. In general, performance problems are not simple and when you are working full tilt just to get rid of OS crash bugs, they can easily be overlooked. The compilation performance was pretty good though. It was somewhere between 20 and 30 minutes to build a full Mach kernel (with optimization turned on). > > Or, maybe this is just a smear campaign by IBM's rivals, who are upset that > IBM has an apparently hot product? The RIOS is definitely in the game performance wise. Architecturally, other RISC chips are getting similar performance with much simpler implementations. I also question the value of proprietary architectures in this day and age. Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94086 zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman (408) 524 8395 "``Ah, so,'' said Daruma the O-maker" -- Tom Robbins