Path: utzoo!attcan!uunet!yale!mfci!colwell
From: colwell@mfci.UUCP (Robert Colwell)
Newsgroups: comp.arch
Subject: Re: VLIW
Message-ID: <556@m3.mfci.UUCP>
Date: 12 Nov 88 18:40:09 GMT
References: <70@armada.UUCP> <28200228@urbsdc> <5087@mit-vax.LCS.MIT.EDU>
Sender: colwell@mfci.UUCP
Reply-To: colwell@mfci.UUCP (Robert Colwell)
Organization: Multiflow Computer Inc., Branford Ct. 06405
Lines: 50

In article <5087@mit-vax.LCS.MIT.EDU> spectre@mit-vax.UUCP (Joseph D. Morrison) writes:
>It seems to me that the issue of VLIW versus scoreboarding is the
>wrong one to discuss.
>
>Scoreboarding is but one of several techniques for managing a
>pipeline.  (Some alternative techniques are micro-dataflow, simple
                                             ^^^^^^^^^^^^^^
Would you elaborate a little on that?  Never heard of it.

>stalling, or letting the compiler stick no-ops in the right places.
>The simple schemes can also be combined with "register bypass" to
>improve pipeline performance.)

We do register bypassing, and it's not free in terms of gates in the
register files, but it's worthwhile.

>So I think we were actually arguing about "which is better for getting
>parallelism; pipelining or VLIW?" Phrased that way, I think the answer
>is obviously "use both".

We did, so I don't see any argument here.

>If each of your functional units takes 4 cycles to perform its
>operation, and you have a VLIW machine with 8 functional units, your
>average throughput will be 2 instructions per cycle. The obvious thing
>to do is to use pipelined functional units, and get the 8 instructions
>per cycle you deserve :-)

We do.  If you put in a functional unit that requires 4 cycles to 
complete, and you DON'T pipeline it, then your first machine will be
your last, because nobody will buy it, the performance will be too low.
The question is, does the compiler manage the pipes, or do you devote
complicated runtime hardware to the task?

>Naturally, as soon as you do this you will need some mechanism for
>handling the various conflicts that occur when two instructions in the
>pipeline want to use the same register. This is when you can use
>scoreboarding, or whatever you want.

We let the compiler do it.  The only reason to make the hardware do it
is to try to handle object code compatibility across different pipeline
latencies.  See other recent articles for more on this.

>In fact, what better way to test pipeline strategies! With all those
>functional units, the pipeline management will be pretty hairy...

So if you do it in software, you get a wrong answer, you fix your tables,
and recompile the compiler (not that that's ever happened to us, you
understand :-)).  And if you do it in hardware, you respin the chip at 
enormous expense and then wait for the next time.