Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!ucsd!mvb.saic.com!ncr-sd!ncrcae!hubcap!fpst
From: jkubicky@tybalt.caltech.edu (Joseph J. Kubicky)
Newsgroups: comp.parallel
Subject: Re: Looking for information on "DELTA-machine"
Message-ID: <1991Mar20.154204.28059@hubcap.clemson.edu>
Date: 20 Mar 91 11:43:52 GMT
References: <1991Mar18.162929.22688@hubcap.clemson.edu>
Sender: fpst@hubcap.clemson.edu (Steve Stevenson)
Organization: California Institute of Technology, Pasadena
Lines: 57
Approved: parallel@hubcap.clemson.edu
Apparently-To: <elroy!ames!comp-parallel@csvax.caltech.edu>

pfluegl@chopin.eng.uci.edu (Manfred J. Pfluegl) writes:

>Recently I had a chance to visit JPL/NASA in Pasadena. During
>several discussions the name "DELTA-machine" was dropped. I never
>heard anything about this system before but would like to read up
>on it. DELTA seems to be a highly parallel architecture and I got
>the impression that it was built by NASA. Can anyone give me
>some references or tell me briefly the characteristics of this
>system?

>Any help is appreciated. 

>                Manfred Pfluegl - believer of "Per Aspera Ad Astra"
>  *****  *****  pfluegl@uci.edu              (Internet)
> * **** * ****  pfluegl@uci.bitnet           (Bitnet)
>*  *****  ****  pfluegl%uci.edu@RELAY.CS.NET (Internet from Europe)

What I know about the Delta machine, which is what I've heard from
Prof. Chuck Seitz in my VLSI class (his group designed the router
chips for the mesh & did some other stuff with the machine), is
this:

	- 2-D mesh (from some charts Seitz showed us last term,
	  their simulations indicate this minimizes network
	  latency over higher-dimension meshes)
	- 576 (I think) i860 processors
	- Something like 25-30GFLOPS peak performance - if you
	  want, figure it out yourself: i860 rated at 66MFLOPS
	  peak (that's single precision, I think - double around
	  40MFLOPS).  Unfortunately, Intel used some creative
	  benchmarking here - when you really look at the chip,
	  you realize that the I/O bandwidth, even though it's
	  got 64-bit data busses & a 128-bit wide on-chip D-cache,
	  won't sustain 40MFLOPS for very long.  Also, other
	  features like parallel execution of scalar & FP ops is
	  tricky (you've actually got to code the instructions such
	  that the scalar opcode is in the lower 32 bits and the FP
	  opcode in the top 32 bits).
	- Asynchronous router chips operate around 200MB/s bandwidth
	  between a node & its four nearest neighbors.
	- Supposedly, we're getting it Spring term sometime.

I'm sure if you play enough games with the code, you can actually
squeeze something like 25 useful GFLOPS out of the machine.  Sorry,
but I know anything about an operating system (I imagine just a front
end at first).

					Jay Kubicky
					jkubicky@cobalt.cco.caltech.edu


-- 
=========================== MODERATOR ==============================
Steve Stevenson                            {steve,fpst}@hubcap.clemson.edu
Department of Computer Science,            comp.parallel
Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell