Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!necntc!necis!encore!fay From: fay@encore.UUCP (Peter Fay) Newsgroups: comp.arch Subject: VLIW (Was: D-machine helped spawn RISC) Message-ID: <1941@encore.UUCP> Date: Mon, 14-Sep-87 10:59:48 EDT Article-I.D.: encore.1941 Posted: Mon Sep 14 10:59:48 1987 Date-Received: Tue, 15-Sep-87 06:33:09 EDT References: <4782@sdcrdcf.UUCP> <475@esunix.UUCP> <347@erc3ba.UUCP> Reply-To: fay@encore.UUCP (Peter Fay) Organization: Encore Computer Corp, Marlboro, MA Lines: 49 In article <347@erc3ba.UUCP> sd@erc3ba.UUCP (S.Davidson) writes: > > >It's happened already, though they are not all the rage yet. They are >called Very Long Instruction Word machines, and one of the originators, >Josh Fisher, did his dissertation on global compaction of horizontal >microcode. Josh moved to Yale after he graduated, and then moved to a >company to build a VLIW machine. I don't know the current status of this machine, >though. At Yale, though, Josh got some very impressive speedups from unrolling >loops and basically running compaction on them, assuming a lot of available resources. >I don't know of any results on real hardware, however. > Funny you should mention this. I was just reading "Unix on a VLIW" (P. Clancy et al. - Proc. Summer 1987 Usenix Conf.) which describes some of Multiflow's hardware and software. Truely incredible stuff, if it's for real. Their high end system (Trace 28/200) claims 28 operations per instruction, 120 MFLOPS and 215 VLIW MIPS. The most intriguing aspect to me, though, is not just their hardware doing 28 formerly sequential instructions in parallel, but their compiler techniques. Normally "conditional jumps occur every five to eight instructions", making parallelization very difficult. So simply take a trace of normal program execution (yes, I know, somewhat awkward compiling new programs) and have the compiler assume it will USUALLY execute that trace. Then compile the new program as if it were not going to take the seldom-used branches and plunge ahead. Of course, if those unlikely branches happen, just do "compensation" (undo what you did wrong). The authors claim instead of several instructions without branches, they can acheive "hundreds or thousands of operations become candidates for overlap". Unfortunately, no hard cold numbers of improved code are presented in this paper. My question to those parllel machine compiler writers out there: is anyone writing compilers for non VLIW machines using the same methods? Why can't, say, an Alliant-type (or Cedar-type, etc.) machine with hardware lock-step between computational elements get a trace execution, recompile assuming no branches, and when the 1000th instruction diverts from the "chosen path", just back up the CE's and undo the damage? peter fay fay@multimax.arpa {allegra|compass|decvax|ihnp4|linus|necis|pur-ee|talcott}!encore!fay -- peter fay fay@multimax.arpa {allegra|compass|decvax|ihnp4|linus|necis|pur-ee|talcott}!encore!fay