Path: utzoo!utgpu!cunews!bnrgate!bigsur!bnr-rsc!bcarh185!schow
From: schow@bcarh185.bnr.ca (Stanley T.H. Chow)
Newsgroups: comp.arch
Subject: Re: speculative execution
Message-ID: <3436@bnr-rsc.UUCP>
Date: 10 Oct 90 20:20:04 GMT
References: <3432@bnr-rsc.UUCP> <1990Oct10.170424.21489@rice.edu>
Sender: news@bnr-rsc.UUCP
Reply-To: bcarh185!schow@bnr-rsc.UUCP (Stanley T.H. Chow)
Organization: BNR Ottawa, Canada
Lines: 70
Summary:
Followup-To:
Keywords:

In article <1990Oct10.170424.21489@rice.edu> preston@titan.rice.edu (Preston Briggs) writes:
>In article <3432@bnr-rsc.UUCP> bcarh185!schow@bnr-rsc.UUCP (Stanley T.H. Chow) writes:
>>I assume you mean given two machines, otherwise identical, differing
>>only in that M1 does hardware speculative execution and M2 does not,
>>you can write a compiler that will make M2 run faster than M1.
>
>That's what I believe.
>
>>This sounds wrong. I do not doubt that you can speedup M2 by speculative
>>execution (with hoisting, etc). But surely the same technology can be
>>applied to M1 with the same result. A priori, I would expect the benifit
>>to be the same for both machines.
>
>Well, I wan't going to let M1 use my fabulous scheduling ideas.
>It had to be satidfied with hardware.  Further, M2 ought to have a
>higher clock speed since its hardware is simpler.

Ah, but that is not very fair, is it? If code scheduling works for both
M1 & M2, why restrict it to M2 only?

It sounds like we are starting to get into the philosophical aspects of
the problem - like the RISC/CISC wars of old (may be even now? :-).

There are certainly many possible trade-off and design points. Some will
rely exclusivly on H/W, others on S/W, and yet others on both. It is not
clear how the H/W and S/W efforts interact. Do they go after the same
"parallellism" or do they exploit different types of "parallellism"? Which
kinds of S/W speculative execution work well with which kinds of H/W?
Do some H/W designs make the S/W impossible?

>I'm advocating a wide instruction word so that I can, in a single
>instruction, specify enough work to keep all the resources busy.
>Ways of accomplishing this include trace scheduling,
>global compaction, and software pipelining.
>
>Speculative execution hardware notices that it has resources
>that aren't fully utilized and tries to find work for them to do.

The difference is that the compiler has the "global" view and can do
"higher" level shuffling and optimization. On the other hand, H/W has
the advantage that it actually "knows" the total instantaneous values
of all the registers. As someone (Andy?) suggests, clever H/W could
resolve aliasing at run time.

>
>I suppose the advantage of speculative hardware is that you can use
>a skinny instruction word to get some of the same effect, 
>non-deterministically.
>
>So fancy hardware can get some parallelism without fancy software,
>and fancy software can get some parallelism without fancy hardware.
>Given one, I don't think you need the other.  So is it cheaper
>to build the compiler or the chip?  Don't forget to make them
>correct.

It is not clear to me that the compiler would be able to extract all
the possible parallelism (even perfect aliasing analysis is not as good
as "instantaneous" alias detection). Surely the H/W can contribute some
more parallelism by "knowing" all the current values!

Often, the actual time required is non-deterministic, e.g., time to fetch
a word, time for a divide. This looks like another area where the H/W may
augment the compiler.


Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!uunet!bnrgate!bcarh185!schow
(613) 763-2831               ..!psuvax1!BNR.CA.bitnet!schow
Me? Represent other people? Don't make them laugh so hard.