Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!decvax!decwrl!amdcad!amdahl!mat
From: mat@amdahl.UUCP
Newsgroups: net.arch,net.unix
Subject: Re: ELXSI System 6400 .... Information needed
Message-ID: <3390@amdahl.UUCP>
Date: Sat, 21-Jun-86 00:49:18 EDT
Article-I.D.: amdahl.3390
Posted: Sat Jun 21 00:49:18 1986
Date-Received: Sun, 22-Jun-86 07:22:00 EDT
References: <203@cybavax.UUCP> <1946@calmasd.CALMA.UUCP> <120@portal.UUcp>
Distribution: net
Organization: Amdahl Corp, Sunnyvale CA
Lines: 38
Xref: watmath net.arch:3535 net.unix:8321

In article <120@portal.UUcp>, jel@portal.UUcp (John Little) writes:
> In article <1946@calmasd.CALMA.UUCP>, rfc@calmasd.CALMA.UUCP (Robert Clayton) writes:
> > a 10 processor test at Sandia Labs they got 10.1X the power of a single
> > processor.  
> 
> This is an interesting trick. Does anyone have a clue about how they
> got a greater than linear speedup?  Was this a cpu benchmark or did
> it include i/o?  

It was a fixed workload benchmark, running a variety of jobs as I recall.
It included I/O, but didn't measure anything but relative CPU throughput
capacity.

The greater than linear speedup occurs as a result of improved locality
of reference, reduced process switching, and better cache performance.
The ELXSI machine uses a message based architecture, and has cached
process context (registers, etc.) for 16 processes per processor. It
is very cheap to switch to a process that has a process slot, and very
expensive to switch to one that doesn't (process 0, the scheduler, must
be woken up to purge one process from its slot and set up the slot for 
the new one before the new one can run. A microcode dispatcher handles
the dispatching of "hot" processes that have a slot. Anyway, the existence
of more process slots reduces the number of very costly swaps, and, as a
byproduct, reduces cache miss rate, etc. Net result is that these savings
more than offset any interprocessor interference losses. Since there is no
memory sharing, this interference is small.

It should be pointed out that the message based architecture induces a very
high process switch rate, which makes these effects quite different than
would be observed in more traditional systems.

In a sense, the superlinear speedup is observed because of reducing overheads
which make the uniprocessor system run "slower than it should."

-- 
Mike Taylor                        ...!{ihnp4,hplabs,amd,sun}!amdahl!mat

[ This may not reflect my opinion, let alone anyone else's.  ]