Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!necntc!necis!encore!fay
From: fay@encore.UUCP (Peter Fay)
Newsgroups: comp.arch
Subject: VLIW (Was: D-machine helped spawn RISC)
Message-ID: <1941@encore.UUCP>
Date: Mon, 14-Sep-87 10:59:48 EDT
Article-I.D.: encore.1941
Posted: Mon Sep 14 10:59:48 1987
Date-Received: Tue, 15-Sep-87 06:33:09 EDT
References: <4782@sdcrdcf.UUCP> <475@esunix.UUCP> <347@erc3ba.UUCP>
Reply-To: fay@encore.UUCP (Peter Fay)
Organization: Encore Computer Corp, Marlboro, MA
Lines: 49

In article <347@erc3ba.UUCP> sd@erc3ba.UUCP (S.Davidson) writes:
>
>
>It's happened already, though they are not all the rage yet.  They are
>called Very Long Instruction Word machines, and one of the originators,
>Josh Fisher, did his dissertation on global compaction of horizontal
>microcode.  Josh moved to Yale after he graduated, and then moved to a
>company to build a VLIW machine.  I don't know the current status of this machine,
>though.  At Yale, though, Josh got some very impressive speedups from unrolling
>loops and basically running compaction on them, assuming a lot of available resources.
>I don't know of any results on real hardware, however.
>

Funny you should mention this. I was just reading "Unix on a VLIW" (P.
Clancy et al. - Proc. Summer 1987 Usenix Conf.) which describes some of
Multiflow's hardware and software. Truely incredible stuff, if it's for
real. Their high end system (Trace 28/200) claims 28 operations per
instruction, 120 MFLOPS and 215 VLIW MIPS.

The most intriguing aspect to me, though, is not just their hardware doing
28 formerly sequential instructions in parallel, but their compiler
techniques. Normally "conditional jumps occur every five to eight
instructions", making parallelization very difficult. So simply take a
trace of normal program execution (yes, I know, somewhat awkward compiling
new programs) and have the compiler assume it will USUALLY execute that
trace. Then compile the new program as if it were not going to take the
seldom-used branches and plunge ahead. Of course, if those unlikely
branches happen, just do "compensation" (undo what you did wrong). The
authors claim instead of several instructions without branches, they can
acheive "hundreds or thousands of operations become candidates for
overlap".

Unfortunately, no hard cold numbers of improved code are presented in this
paper.

My question to those parllel machine compiler writers out there: is anyone
writing compilers for non VLIW machines using the same methods? Why can't,
say, an Alliant-type (or Cedar-type, etc.) machine with hardware lock-step
between computational elements get a trace execution, recompile assuming
no branches, and when the 1000th instruction diverts from the "chosen
path", just back up the CE's and undo the damage?

			peter fay
			fay@multimax.arpa
{allegra|compass|decvax|ihnp4|linus|necis|pur-ee|talcott}!encore!fay
-- 
			peter fay
			fay@multimax.arpa
{allegra|compass|decvax|ihnp4|linus|necis|pur-ee|talcott}!encore!fay