Path: utzoo!attcan!uunet!lll-winken!sun-barr!olivea!mintaka!spdcc!esegue!compilers-sender
From: hankd@dynamo.ecn.purdue.edu (Hank Dietz)
Newsgroups: comp.compilers
Subject: Re: Compilers taking advantage of architectural enhancements
Summary: Wrong concept: compiler & architecture should be together
Keywords: design, optimize
Message-ID: <9010161400.AA26606@dynamo.ecn.purdue.edu>
Date: 16 Oct 90 14:00:36 GMT
References: <1990Oct9> <3300194@m.cs.uiuc.edu> <1990Oct12.024252.8361@esegue.segue.boston.ma.us>
Sender: compilers-sender@esegue.segue.boston.ma.us
Reply-To: hankd@dynamo.ecn.purdue.edu (Hank Dietz)
Organization: Purdue University Engineering Computer Network
Lines: 133
Approved: compilers@esegue.segue.boston.ma.us

In article <1990Oct12.024252.8361@esegue.segue.boston.ma.us> aglew@crhc.uiuc.edu (Andy Glew) writes:
>>[gillies@m.cs.uiuc.edu]
>>One of the problems in new CPU designs is that designers don't realize
>>which architecture enhancements are pointless, because we don't and
>>may never have the optimization technology to take advantage of them.
>
>Ah, a potentially interesting and useful topic.  Perhaps we can start
>a discussion that will lead to a list of possible hardware
>architectural enhancements that a compiler can/cannot take advantage
>of?  Maybe the real compiler guys (preston, David C.) will help us out?
...
>For example, many of the parallelizing FORTRAN loop optimizations can
>reasonably only be expected to be done by people from Rice or UIUC CSRD (note
>that I'm UIUC, but CRHC, not CSRD), so unless you are willing to pay for
>these guys (or their assorted front companies) you aren't likely to get them
>onto your machine.

While it is true that the group of researchers in automatic parallelization
is small, it certainly isn't limited to UIUC CSRD and Rice; there are also
substantial efforts at IBM, Intel, UC Irvine, CMU, etc.  For example, Purdue
faculty in this field include Jose Fortes and me -- both of us have been
publishing in this field for more than five years (well over 50
publications) and we have implemented working parallelizers for subsets of
the C language targeted to several different architectures.  The point is
that, although compiler experts are in demand, it simply isn't true that
there are only one or two places that know how to do things.

Further, at Purdue EE, I teach a graduate course on compiler code
generation, optimization, and parallelization.  In the course, *EVERY*
student implements an optimizing, parallelizing, compiler for a small
language and targets it to a simple parallel abstract machine -- usually a
VLIW.  I'm not saying that one course makes them experts, but the students
from that course are virtually all compiler-literate to the point where at
least relatively mundane things like traditional dependence analysis and
vectorization are well within their grasp.  Students complete that course at
a rate of about 15/year.

>Cost of compiler development can be significant.  Sometimes a company might
>put a hardware feature in even though they know a compiler approach would be
>better, because they can't afford the compiler guys (or the compiler guys
>already have exclusive contracts with a competitor).

In my view, this is 99% false.  Companies tend to put the money into
hardware because it is more concrete and they also are used to putting money
into hardware.  For example, one of my former students works at Motorola as
a compiler person -- but he's one of a very few compared to *MANY*
architecture/hardware folk.  In fact, he also has an architecture background
and without it he probably wouldn't have been given the job.  Companies have
to learn that creating a compiler is comparably difficult to creating an
architecture; the tendency is to give it less weight, resulting in
overworked compiler people and delays in completing the compilers.  A
secondary issue is that designing one without deeply considering the other
just plain doesn't work, and there are few people who are experts in BOTH
compiler and architecture to act as the interface between the two groups.

In contrast, consider a company like Burton Smith's Tera.  Burton knows what
he's doing -- he has tried very hard to make his company have a balance of
compiler, architecture/hardware, and OS people.  Did he have trouble getting
these people?  Perhaps a bit -- good OS folk are particularly hard to find
in these "well, let's just port unix" times -- but generally I'd say he had
less trouble than most companies would have because it is clear that he
values these people at least as much as he values architecture/hardware
types.

>Let me list a few things to start off the discussion.  I hope and expect to
>be shot down on a few of them.  It's easy to list a few of the hardware
>enhancements that we already know compilers can take advantage of.

Wrong viewpoint or, as a certain public figure used to say, "well, there you
go again."  You're trying to give architecture/hardware people a list of
"you can feel safe doing this without consulting a compiler person" things
-- the trick is to involve compiler people throughout rather than letting
the machine be built and then calling in the compiler folk (and giving them
H*ll because the compiler hasn't been able to achieve the machine's peak
performance and wasn't delivered on time).

For some years now, I've had a research group (about 2-3 faculty and
10-20 students) called CARP:  Compiler-oriented Architecture Research
at Purdue.  A one paragraph version of our manifesto:

"Research in compiler optimization/parallelization and hardware architecture
is, and should be, tightly interwoven.  CARP, the Compiler-oriented
Architecture Research group at Purdue, centers on the innovative use of the
interaction of compiler and architecture to increase system performance.  In
general, this is accomplished by blending STATIC (compile-time,
assemble-time, or link-time) and DYNAMIC (runtime hardware, firmware, or
operating system) analysis so that each computational atom is processed in
the most efficient and reliable way.  Statically, it is possible to
understand/transform the entire program, yet only probabilistic knowledge is
available (e.g., one can know branching probabilities, but not which way the
branch goes this time).  Dynamically, understanding/transformability is
limited to a few instructions around the current program counter, but
perfect knowledge within that range is common.  Very few problems can be
solved equally well using either kind of information -- the trick is simply
to solve each problem in the right place."


>Branch Delay Slots - small number
>Branch Delay slots - large number
>Register file - moderate sized (up to 32 registers)
>Register file - large (around 128 registers, or more)
>Separate floating point register file
>Heterogenous register file
>Instruction cache
>Micro-scheduling parallelism (like CONVEX's ASAP)
>Vectorizing
>Multiple functional units - heterogenous - VLIW or superscalar
>Multiple functional units - homogenous - VLIW or superscalar

All old ideas with multiple viable approaches in the literature.  This
is not to say they are done perfectly, but that's not the issue in
making a product....  Unfortunately, a few are not easy to automate in
"generic" code generators (e.g., heterogeneous register file).

>Multiple, hierarchical, registers sets
>Data cache - software managed consistency
>Parallelizing - fine grain, small numbers of processors
>Parallelizing, fine grain, large numbers of processors.
>Special hardware instructions - scalar

These are problems with no readily available "cookbook" solutions.
That doesn't necessarily mean they'd be hard for a compiler to deal
with, just that it will take a bit of head scratching....

Of course, I still contend that the above list is headed the wrong way -- we
should be looking for new ideas being synthesized by viewing both compiler
and architecture/hardware.  For example, the Barrier MIMD work (see papers
in ICPP 90) could only have come from such a holistic view.

						-hankd@ecn.purdue.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.