Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!caip!topaz!ll-xn!nike!lll-crg!seismo!rochester!ritcv!cci632!rb
From: rb@cci632.UUCP (Rex Ballard)
Newsgroups: net.arch,net.unix
Subject: Re: ELXSI System 6400 .... Information needed
Message-ID: <131@cci632.UUCP>
Date: Thu, 26-Jun-86 11:16:32 EDT
Article-I.D.: cci632.131
Posted: Thu Jun 26 11:16:32 1986
Date-Received: Sat, 28-Jun-86 02:44:27 EDT
References: <203@cybavax.UUCP> <1946@calmasd.CALMA.UUCP> <120@portal.UUcp>
Reply-To: rb@ccird1.UUCP (Rex Ballard)
Distribution: net
Organization: CCI, Rochester Development, Rochester, NY
Lines: 61
Xref: watmath net.arch:3593 net.unix:8416
Summary: How to get 101%, don't count everything.

In article <120@portal.UUcp> jel@portal.UUcp (John Little) writes:
>In article <1946@calmasd.CALMA.UUCP>, rfc@calmasd.CALMA.UUCP (Robert Clayton) writes:
>> a 10 processor test at Sandia Labs they got 10.1X the power of a single
>> processor.  
>
>This is an interesting trick. Does anyone have a clue about how they
>got a greater than linear speedup?

Sure, I've seen it several times in several different situations.
The secret is to not count anything other than "CPU" instruction speed.
In reality, there are probably DMA, MMU and related controllers that are
not included in the MIPS figures.  Caching, asynchronous processing, and
CPU time normally spent doing other things can also be contributing factors.

One really old trick is to use the MMU to do "string moves", this is
especially useful for "pipes" or their equivalents, where you know that
the original is no longer needed.

>Was this a cpu benchmark or did it include i/o? 

Any multi-processor benchmark requires at least some I/O even if it is
just inter-process "pipes".  If the single CPU timings were based on
drystones, but the multi was prolog LIPs or some similar arrangement,
the CPU ratings may have actually been too low.

Even if the exact same algorythm was used (DMA controllers,...), the
bus contention of DMA to/from the same processor vs two different
processors would still lead to a small (1%) increase for two.

From my own experience, I'm suprised they only got 1% on 10 processors,
it should have been .3%/processor.

Sequent, CCI, and several others have often found performance increases
on certain applications (esp. the ones they were designed for).

>Can I program my single processor to emulate a
>multiprocessor configuration and get increased performance :-) ?

In a way, yes!  By using an ACRTC rather than a "Bit mapped" graphics
display, an X.25 serial link instead of an RS-232 link ('rupts every
block instead of every character), and about 20 other "tricks", you
could actually get 200 times the performance of an equivalent "CPU
only system".  It wouldn't show up in the Drystones or Whetstones,
but it would be noticable to the user.

A number of 68020 and 68010 boxes have "Comm boards" that contain
additional processors, including 68008s, 80186s and others, along
with DMA, local memory (for buffering), and individual lines.  These
are usually not taken into consideration when Drystones are compared.
A 5/30 is nominally rated at 2 MIPS, but there are a minimum of 2
additional 1 mips processors "hidden" in the controller boards.

A Sun workstation isn't blindingly fast in Drystones, but for graphics,
it would beat a VAX 8600 (if the Vax ran bit-mapped).

A Cray X-MP will beat a 6/32 in number crunching any day, but a 6/32
does data bases and file servers extremely well.  It's simply a matter
of planning your system archetecture for the type of work you intend
to do.

Just to be fair, what benchmarks did they use?