Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!ucsd!mvb.saic.com!ncr-sd!ncrcae!hubcap!fpst From: jkubicky@tybalt.caltech.edu (Joseph J. Kubicky) Newsgroups: comp.parallel Subject: Re: Looking for information on "DELTA-machine" Message-ID: <1991Mar20.154204.28059@hubcap.clemson.edu> Date: 20 Mar 91 11:43:52 GMT References: <1991Mar18.162929.22688@hubcap.clemson.edu> Sender: fpst@hubcap.clemson.edu (Steve Stevenson) Organization: California Institute of Technology, Pasadena Lines: 57 Approved: parallel@hubcap.clemson.edu Apparently-To: pfluegl@chopin.eng.uci.edu (Manfred J. Pfluegl) writes: >Recently I had a chance to visit JPL/NASA in Pasadena. During >several discussions the name "DELTA-machine" was dropped. I never >heard anything about this system before but would like to read up >on it. DELTA seems to be a highly parallel architecture and I got >the impression that it was built by NASA. Can anyone give me >some references or tell me briefly the characteristics of this >system? >Any help is appreciated. > Manfred Pfluegl - believer of "Per Aspera Ad Astra" > ***** ***** pfluegl@uci.edu (Internet) > * **** * **** pfluegl@uci.bitnet (Bitnet) >* ***** **** pfluegl%uci.edu@RELAY.CS.NET (Internet from Europe) What I know about the Delta machine, which is what I've heard from Prof. Chuck Seitz in my VLSI class (his group designed the router chips for the mesh & did some other stuff with the machine), is this: - 2-D mesh (from some charts Seitz showed us last term, their simulations indicate this minimizes network latency over higher-dimension meshes) - 576 (I think) i860 processors - Something like 25-30GFLOPS peak performance - if you want, figure it out yourself: i860 rated at 66MFLOPS peak (that's single precision, I think - double around 40MFLOPS). Unfortunately, Intel used some creative benchmarking here - when you really look at the chip, you realize that the I/O bandwidth, even though it's got 64-bit data busses & a 128-bit wide on-chip D-cache, won't sustain 40MFLOPS for very long. Also, other features like parallel execution of scalar & FP ops is tricky (you've actually got to code the instructions such that the scalar opcode is in the lower 32 bits and the FP opcode in the top 32 bits). - Asynchronous router chips operate around 200MB/s bandwidth between a node & its four nearest neighbors. - Supposedly, we're getting it Spring term sometime. I'm sure if you play enough games with the code, you can actually squeeze something like 25 useful GFLOPS out of the machine. Sorry, but I know anything about an operating system (I imagine just a front end at first). Jay Kubicky jkubicky@cobalt.cco.caltech.edu -- =========================== MODERATOR ============================== Steve Stevenson {steve,fpst}@hubcap.clemson.edu Department of Computer Science, comp.parallel Clemson University, Clemson, SC 29634-1906 (803)656-5880.mabell