Path: utzoo!utgpu!watserv1!watmath!att!att!emory!samsung!noose.ecn.purdue.edu!dynamo.ecn.purdue.edu!hankd From: hankd@dynamo.ecn.purdue.edu (Hank Dietz) Newsgroups: comp.arch Subject: Re: speculative execution Summary: Mostly a static problem, but... Message-ID: <1990Oct23.183709.29005@ecn.purdue.edu> Date: 23 Oct 90 18:37:09 GMT References: <162639@<1990Oct9> <3300194@m.cs.uiuc.edu> Sender: news@ecn.purdue.edu (USENET news) Organization: Purdue University Engineering Computer Network Lines: 46 Well, first off let me say that speculative execution consists of two components: getting the instructions started and making sure the program semantics are preserved. Only a static mechanism, e.g., a compiler, can look arbitrarily far ahead for instructions to start; hence, compiler technology beats hardware on this by a large margin. However, preserving program semantics can be a dynamic problem: for example, suppose that you get a floating-point exception in one of a bunch of speculatively-executed instructions... you really need a dynamic mechanism, e.g. hardware, to deal with this -- as in the Multiflow machines. Hence, the basic problem is static (compilers), but some secondary problems could use dynamic (hardware) support. In article <3300194@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes: >Many enhancements to "increase parallelism" cannot be exploited by >compilers unless P = NP, or unless you are willing to wait {hours, >days, weeks} for your compiler to finish its job. So don't just >relegate EVERYTHING to the compiler! ... >I believe you are much better off spending the extra real estate to >implement identical functional units, or add vector hardware, since >the compiler technology for exploiting these features runs in >polynomial time. This sounds reasonable, but isn't really correct. The trick is to recognize that the comparison here is between spending lots of compile time to generate *OPTIMAL* solutions and spending no compile time to implement no solution (i.e., simply discarding the structure). The best compromise is generally to invest enough compile time to get most of the benefits... and that usually isn't excessive compile time. As for vector, etc., being polynomial time compiler problems, that simply isn't true for optimal solutions -- but nobody cares because you can get good solutions quickly enough. Remember that O() complexity measures don't tell you actual times -- i.e., high O() complexity doesn't necessarily imply long runtime for real cases. For example, who cares if compile time "blows up" when given input with 30-level nested loops? If it really bothers you in such a rare case, you just turn off the expensive optimization for that particular piece of code... no big deal. My rule is simply to figure out where each thing should be done (static vs. dynamic mechanisms) and then to do the best we can in implementing it that way. In this approach, I have yet to come across a compiler problem that defied reasonably quick, approximate, solution. -hankd@ecn.purdue.edu