Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!shadooby!samsung!brutus.cs.uiuc.edu!apple!usc!ucla-cs!oahu!stephen
From: stephen@oahu.cs.ucla.edu (Steve Whitney)
Newsgroups: comp.sys.atari.st
Subject: Re: UNIX -- ATW Speed
Message-ID: <28655@shemp.CS.UCLA.EDU>
Date: 31 Oct 89 00:28:35 GMT
References: <545@nikhefh.nikhef.nl> <5188@cbnewsh.ATT.COM>
Sender: news@CS.UCLA.EDU
Reply-To: stephen@oahu.cs.ucla.edu (Steve Whitney)
Organization: UCLA Computer Science Department
Lines: 61

In article <5188@cbnewsh.ATT.COM> wolf@cbnewsh.ATT.COM (thomas.wolf,ho,) writes:
[performance numbers deleted]
~
~These claims are very misleading in that they imply that a system with 13
~transputers will actually run a program 1/5th as fast as a Cray (I'm assuming
~the above MIPS figures are correct.)  _Each_ T800 has a processing power
~of 10 MIPS (the upper limit at which instructions can be processed.)  So
~if you have a calculation-intensive application, say one that does ray-tracing,
~you can expect that peek to be reached.  If you ran this same program on
~a 13-transputer based ATW, I doubt whether you will see significant increases
~in performance, since that application will be run on a single T800 (unless
~it was specifically written with parallelism in mind

This is somewhat true.  Although Crays perform well on scalar code, their real
speed is only evident on vectorizable problems.  As on the Transputer, 
applications must be specifically engineered to perform well.  It is true
that there are Fortran compilers which are _very_ good at vectorizing code,
and C compilers which can do the same are starting to appear, but since C
isn't deisgned for vectors, it's not as easy to do.

~ 						- using OCCAM(I guess that
~is the T800 assembly language?) - I don't think there are compilers smart
~enough to take conventional C programs and parallelize them.)

Occam is a high level language, but it is designed to be simple.  The name
comes from Occam's razor which was said to be able to separate what was
necessary from what was unnecessary.  Occam has constructs for making
parallelization easy so it can use as many processors as are available.
The application you mentioned (ray-tracing) happens to be an extremely easily
parallelizable problem (I've done it on a four processor SGI Iris) so it
might very well reach the peak performance of the T800s.

~I am not familiar as to how applications have to be written to run on a CRAY,
~but my guess is that you can use conventional C programming.  So if you took
~the above program, recompiled and ran it on a CRAY, you _would_ see significant
~speed improvements  (if 100MIPS = 1/5 Cray, I guess the program would run
~at 500 MIPS?)
~

And again, if you compiled said application with a parallelizing compiler, 
you would see vast improvement on a 13 processor transputer system.
It's true that communication overhead will detract from the transputer
performance and not the Cray.  Take a look at CMU's iWarp processor.  It's
a good example of a machine designed to be good on its own and _better_
when networked.  The link speeds make the transputer look like a 6502.

[philosophical comments on benchmarks deleted]
~Tom
~
~-- 
~+---------------+-----------------------------+  I don't remember,
~| Tom Wolf      | Phone:  (201) 949-2079      |  I don't recall,
~| Bell Labs, NJ | E-mail: twolf@homxb.att.com |  I have no memory,
~+---------------+-----------------------------+  Of anything at all. P. Gabriel

But then again, this probably belongs in comp.arch anyway...

Steve Whitney   "It's never _really_ the last minute"       (())_-_(())
UCLA Comp. Sci. Grad. Student                                | (* *) | 
Internet: stephen@cs.ucla.edu              UCLA Bruin-->    {  \_@_/  }
GEnie:    S.WHITNEY                                           `-----'