Path: utzoo!attcan!uunet!zephyr.ens.tek.com!uw-beaver!rice!titan.rice.edu!preston
From: preston@titan.rice.edu (Preston Briggs)
Newsgroups: comp.arch
Subject: Re: speculative execution
Message-ID: <1990Oct10.164353.21070@rice.edu>
Date: 10 Oct 90 16:43:53 GMT
References: <1990Oct9.212103.363@rice.edu> <12905@encore.Encore.COM>
Sender: news@rice.edu (News)
Organization: Rice University, Houston
Lines: 74

I wrote
>> In general, we need to be careful about fatally increasing
>> register pressure.  The i860's exposed pipeline provides an
>> elegant way out, allowing simple aborts of optimistic 
>> computations by ignoring what's partially computed in
>> the pipe.

and
In article <12905@encore.Encore.COM> jkenton@pinocchio.encore.com (Jeff Kenton) writes:
>It would take a lot to convince me that the i860 is an elegant solution
>to anything.  No one has produced a compiler which can take advantage of
>the theoretically possible parallelism of the i860.  It's a very fast
>chip for certain kinds of applications, but I wouldn't call it elegant,
>or general purpose.

Lots of complaints here...
First, the exposed pipeline stuff.  If we've got an if-then
that looks like this

		int-1
		int-2
		int-3
		if (something) {
		    pfmul.ss f3,f4,f0
		    pfmul.ss f0,f0,f0
		    pfmul.ss f0,f0,f0
		    pfmul.ss f0,f0,f5
		}
		fst.l	f5,somewhere

The idea is that if something is true we multiply f3 and f4 together,
putting the reult in f5.  Then we store f5.  So we can't optimistically
perform the entire multiply before knowing the value of "something"
since f5 is live on the false branch.

We can however, hoist the initial pipeline stages (perhaps overlapping
them with earlier pipeline compuations).

		int-1, pfmul.ss f3,f4,f0
		int-2, pfmul.ss f0,f0,f0
		int-3, pfmul.ss f0,f0,f0
		if (something) {
		    pfmul.ss f0,f0,f5
		}
		fst.l	f5,somewhere

The true path get much shorter.
No increase in the path length of the false path.
And no extra register required.

It's perhaps a dirty trick rather than elegant, but I try to
describe my ideas glowingly and reserve disparaging terms for other
peoples' work.

The point though, is that the exposed pipeline scheme requires less
registers because result registers are not frozen at the beginning of
a pipelined sequence, but at the end.  Similarly, the source registers
become avaliable immediately after they are used.  In the example above,
f3 and f4 are immediately avaliable after the 1st instruction and f5
isn't required until the result pops out of the pipe.

Renaming helps, but requires more hidden registers that might be
used profitably by the compiler for other work.

Regarding compilers, I believe The Portland Group and Ardent both have
compilers that will take advantage of the pipelined instructions.
Besides that, the i860 is a wonderful source of thesis topics.

The i860 may not be your ideal chip, but it's chock full of ideas.
The good and useful ones shouldn't be ignored.

-- 
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu