Path: utzoo!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!tut.cis.ohio-state.edu!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!haven!ncifcrf!lhc!usenet From: usenet@nlm.nih.gov (usenet news poster) Newsgroups: comp.arch Subject: Re: Killer Micro II Keywords: systolic arrays Message-ID: <1990Aug26.192641.16647@nlm.nih.gov> Date: 26 Aug 90 19:26:41 GMT References: <527@llnl.LLNL.GOV> Reply-To: states@tech.NLM.NIH.GOV (David States) Organization: National Library of Medicine, Bethesda, Md. Lines: 33 brooks@physics.llnl.gov (Eugene D. Brooks III) writes: > Meet Killer Micro II, described by Bipolar Integrated Technology, of Portland > Oregon, at the Hot Chips Symposium which was held in Santa Clara this month: > -> 200K transistors on single ECL chip which dissapates 28 watts > -> Clocked at 100 MHZ (80 MHZ Cray 1, 117 MHZ XMP, 154 MHZ YMP) > -> Two 64 bit read ports, one 64 write port, concurrent transfers > -> Capable of one 64 bit ADD and one 64 bit MULT each clock > IEEE DIVIDE, SQRT tossed in for free > -> Full Integer ALU operations > -> 200 MFLOPS, sustainable, peak performance > Read, Read, FLOP, FLOP, Write: each and every clock! > > Anyone got some ``vector register'' chips, and decent memory chips, > to keep this beast fed??? If you have a specific algorithm to implement and you are willing to build a dedicated processor, systolic arrays of a chip like this could give real bang for the buck. Of course even if the calculation only needs a single input and writes single output per cycle, you are still talking sustained simultaneous read and write rates of 800 MB/sec at the ends of the pipe. Using temporally interleaved operations on a physically reenterant systolic pipe you could use one chip several places in the calculation and scale the overall processing rate and I/O rates back, but then you need some flexibility in ports to the chip. Assuming this is a board level product to be integrated into an existing WS and you are willing to incorporate a few chips/board in the systolic array, the individual chip performances don't need to be nearly as aggressive. Anyone know of a CMOS add + mul FP chip with say 3 read and 2 write ports and an internal crossbar switch that would run at a modest 50 MFLOP per chip? David States