Path: utzoo!utgpu!cunews!bnrgate!bigsur!bnr-rsc!bcarh185!schow From: schow@bcarh185.bnr.ca (Stanley T.H. Chow) Newsgroups: comp.arch Subject: Re: speculative execution Message-ID: <3436@bnr-rsc.UUCP> Date: 10 Oct 90 20:20:04 GMT References: <3432@bnr-rsc.UUCP> <1990Oct10.170424.21489@rice.edu> Sender: news@bnr-rsc.UUCP Reply-To: bcarh185!schow@bnr-rsc.UUCP (Stanley T.H. Chow) Organization: BNR Ottawa, Canada Lines: 70 Summary: Followup-To: Keywords: In article <1990Oct10.170424.21489@rice.edu> preston@titan.rice.edu (Preston Briggs) writes: >In article <3432@bnr-rsc.UUCP> bcarh185!schow@bnr-rsc.UUCP (Stanley T.H. Chow) writes: >>I assume you mean given two machines, otherwise identical, differing >>only in that M1 does hardware speculative execution and M2 does not, >>you can write a compiler that will make M2 run faster than M1. > >That's what I believe. > >>This sounds wrong. I do not doubt that you can speedup M2 by speculative >>execution (with hoisting, etc). But surely the same technology can be >>applied to M1 with the same result. A priori, I would expect the benifit >>to be the same for both machines. > >Well, I wan't going to let M1 use my fabulous scheduling ideas. >It had to be satidfied with hardware. Further, M2 ought to have a >higher clock speed since its hardware is simpler. Ah, but that is not very fair, is it? If code scheduling works for both M1 & M2, why restrict it to M2 only? It sounds like we are starting to get into the philosophical aspects of the problem - like the RISC/CISC wars of old (may be even now? :-). There are certainly many possible trade-off and design points. Some will rely exclusivly on H/W, others on S/W, and yet others on both. It is not clear how the H/W and S/W efforts interact. Do they go after the same "parallellism" or do they exploit different types of "parallellism"? Which kinds of S/W speculative execution work well with which kinds of H/W? Do some H/W designs make the S/W impossible? >I'm advocating a wide instruction word so that I can, in a single >instruction, specify enough work to keep all the resources busy. >Ways of accomplishing this include trace scheduling, >global compaction, and software pipelining. > >Speculative execution hardware notices that it has resources >that aren't fully utilized and tries to find work for them to do. The difference is that the compiler has the "global" view and can do "higher" level shuffling and optimization. On the other hand, H/W has the advantage that it actually "knows" the total instantaneous values of all the registers. As someone (Andy?) suggests, clever H/W could resolve aliasing at run time. > >I suppose the advantage of speculative hardware is that you can use >a skinny instruction word to get some of the same effect, >non-deterministically. > >So fancy hardware can get some parallelism without fancy software, >and fancy software can get some parallelism without fancy hardware. >Given one, I don't think you need the other. So is it cheaper >to build the compiler or the chip? Don't forget to make them >correct. It is not clear to me that the compiler would be able to extract all the possible parallelism (even perfect aliasing analysis is not as good as "instantaneous" alias detection). Surely the H/W can contribute some more parallelism by "knowing" all the current values! Often, the actual time required is non-deterministic, e.g., time to fetch a word, time for a divide. This looks like another area where the H/W may augment the compiler. Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!uunet!bnrgate!bcarh185!schow (613) 763-2831 ..!psuvax1!BNR.CA.bitnet!schow Me? Represent other people? Don't make them laugh so hard.