Path: utzoo!utgpu!watserv1!watmath!att!att!emory!samsung!noose.ecn.purdue.edu!dynamo.ecn.purdue.edu!hankd
From: hankd@dynamo.ecn.purdue.edu (Hank Dietz)
Newsgroups: comp.arch
Subject: Re: speculative execution
Summary: Mostly a static problem, but...
Message-ID: <1990Oct23.183709.29005@ecn.purdue.edu>
Date: 23 Oct 90 18:37:09 GMT
References: <162639@<1990Oct9> <3300194@m.cs.uiuc.edu>
Sender: news@ecn.purdue.edu (USENET news)
Organization: Purdue University Engineering Computer Network
Lines: 46

Well, first off let me say that speculative execution consists of two
components:  getting the instructions started and making sure the
program semantics are preserved.  Only a static mechanism, e.g., a
compiler, can look arbitrarily far ahead for instructions to start;
hence, compiler technology beats hardware on this by a large margin.
However, preserving program semantics can be a dynamic problem:  for
example, suppose that you get a floating-point exception in one of a
bunch of speculatively-executed instructions...  you really need a
dynamic mechanism, e.g. hardware, to deal with this -- as in the
Multiflow machines.  Hence, the basic problem is static (compilers),
but some secondary problems could use dynamic (hardware) support.

In article <3300194@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes:
>Many enhancements to "increase parallelism" cannot be exploited by
>compilers unless P = NP, or unless you are willing to wait {hours,
>days, weeks} for your compiler to finish its job.  So don't just
>relegate EVERYTHING to the compiler!
...
>I believe you are much better off spending the extra real estate to
>implement identical functional units, or add vector hardware, since
>the compiler technology for exploiting these features runs in
>polynomial time.

This sounds reasonable, but isn't really correct.  The trick is to
recognize that the comparison here is between spending lots of compile
time to generate *OPTIMAL* solutions and spending no compile time to
implement no solution (i.e., simply discarding the structure).  The
best compromise is generally to invest enough compile time to get most
of the benefits...  and that usually isn't excessive compile time.  As
for vector, etc., being polynomial time compiler problems, that simply
isn't true for optimal solutions -- but nobody cares because you can
get good solutions quickly enough.

Remember that O() complexity measures don't tell you actual times --
i.e., high O() complexity doesn't necessarily imply long runtime for
real cases.  For example, who cares if compile time "blows up" when
given input with 30-level nested loops?  If it really bothers you in
such a rare case, you just turn off the expensive optimization for
that particular piece of code...  no big deal.

My rule is simply to figure out where each thing should be done
(static vs. dynamic mechanisms) and then to do the best we can in
implementing it that way.  In this approach, I have yet to come across
a compiler problem that defied reasonably quick, approximate, solution.

						-hankd@ecn.purdue.edu