Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!wuarchive!usc!rutgers!rochester!pt.cs.cmu.edu!o.gp.cs.cmu.edu!andrew.cmu.edu!zs01+
From: zs01+@andrew.cmu.edu (Zalman Stern)
Newsgroups: comp.arch
Subject: Re: IBM RS6000
Message-ID: <wbXbwbC00asV8w_kRp@andrew.cmu.edu>
Date: 12 Jan 91 03:14:47 GMT
References: <1991Jan10.214122.9506@news.arc.nasa.gov>
Organization: Information Technology Center, Carnegie Mellon, Pittsburgh, PA
Lines: 67
In-Reply-To: <1991Jan10.214122.9506@news.arc.nasa.gov>

lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes:
>[IBM RISC System/6000 is very fast on vector code and as fast as other
> processors (at equiv. clock) on scalar code.]

The hardware has lower FP cycle counts than other processors.  (One
exception is the MIPS R6000.) One place where the RIOS falls down is that
branches take too long.  (zero to three cycles with the average on the high
side.)  There is some room to improve this in the implementation.

> [Context switching is rumored to be slow.]
> If it bad, why is it?  What is it about the design?  
> 
> Memory management?  

The RIOS MMU is an excersise in complexity. The inverted page table (IPT)
with hardware reload and hardware lock bit support is too far gone. TLB
reload is somewhat slow as a result. One might see performance problems
with processes that thrash the TLB. I haven't measured this though and it
would only show up for large processes. The IPT also limits how different
address spaces can share memory. [See the IPT flamage that has shown up in
this newsgroup at least three times already.] This leads to performance
tradeoffs for Mach. In practice this isn't a problem and it certainly
shouldn't show up in AIX since it only shares 256 megabyte segments between
processes anyway. (Segment sharing is efficient on the RIOS hardware.)

> 
> Cache?

Shouldn't be a problem. the cache is tagged with 52 bit virtual addresses
so there is no need to flush anything on a context switch. the 4 way set
associative data cache might improve cache residency across context
switches. (That is the next time your process gets scheduled, there is a
better chance that some of its data will still be in the cache.)

> 
> O/S bug or feature?

Most likely. Wouldn't be the first performance bug in AIX 3.1 :-) One
way to test it would be to get context switch times for Mach 2.5 on the
RIOS and compare them to the DECstation 5000.

> 
> How could IBM have missed something like this in the design (it should have
> been obvious when the first prototype was built...?  Doesn't everyone do
> big compiles as background jobs?)

When I was doing development on a 530 (25 Mhz RIOS) I didn't notice these
problems. (My MIPS Magnum feels a little better, but at least part of that
is the losing X11 performance on the RIOS.)  Of course, a single user
workload is not a good test case for context switching. In general,
performance problems are not simple and when you are working full tilt just
to get rid of OS crash bugs, they can easily be overlooked. The compilation
performance was pretty good though. It was somewhere between 20 and 30
minutes to build a full Mach kernel (with optimization turned on).

> 
> Or, maybe this is just a smear campaign by IBM's rivals, who are upset that
> IBM has an apparently hot product?

The RIOS is definitely in the game performance wise. Architecturally, other
RISC chips are getting similar performance with much simpler
implementations.  I also question the value of proprietary architectures in
this day and age.

Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94086
zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman     (408) 524 8395
       "``Ah, so,'' said Daruma the O-maker" -- Tom Robbins