Path: utzoo!attcan!uunet!lll-winken!sun-barr!olivea!mintaka!spdcc!esegue!compilers-sender From: hankd@dynamo.ecn.purdue.edu (Hank Dietz) Newsgroups: comp.compilers Subject: Re: Compilers taking advantage of architectural enhancements Summary: Wrong concept: compiler & architecture should be together Keywords: design, optimize Message-ID: <9010161400.AA26606@dynamo.ecn.purdue.edu> Date: 16 Oct 90 14:00:36 GMT References: <1990Oct9> <3300194@m.cs.uiuc.edu> <1990Oct12.024252.8361@esegue.segue.boston.ma.us> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: hankd@dynamo.ecn.purdue.edu (Hank Dietz) Organization: Purdue University Engineering Computer Network Lines: 133 Approved: compilers@esegue.segue.boston.ma.us In article <1990Oct12.024252.8361@esegue.segue.boston.ma.us> aglew@crhc.uiuc.edu (Andy Glew) writes: >>[gillies@m.cs.uiuc.edu] >>One of the problems in new CPU designs is that designers don't realize >>which architecture enhancements are pointless, because we don't and >>may never have the optimization technology to take advantage of them. > >Ah, a potentially interesting and useful topic. Perhaps we can start >a discussion that will lead to a list of possible hardware >architectural enhancements that a compiler can/cannot take advantage >of? Maybe the real compiler guys (preston, David C.) will help us out? ... >For example, many of the parallelizing FORTRAN loop optimizations can >reasonably only be expected to be done by people from Rice or UIUC CSRD (note >that I'm UIUC, but CRHC, not CSRD), so unless you are willing to pay for >these guys (or their assorted front companies) you aren't likely to get them >onto your machine. While it is true that the group of researchers in automatic parallelization is small, it certainly isn't limited to UIUC CSRD and Rice; there are also substantial efforts at IBM, Intel, UC Irvine, CMU, etc. For example, Purdue faculty in this field include Jose Fortes and me -- both of us have been publishing in this field for more than five years (well over 50 publications) and we have implemented working parallelizers for subsets of the C language targeted to several different architectures. The point is that, although compiler experts are in demand, it simply isn't true that there are only one or two places that know how to do things. Further, at Purdue EE, I teach a graduate course on compiler code generation, optimization, and parallelization. In the course, *EVERY* student implements an optimizing, parallelizing, compiler for a small language and targets it to a simple parallel abstract machine -- usually a VLIW. I'm not saying that one course makes them experts, but the students from that course are virtually all compiler-literate to the point where at least relatively mundane things like traditional dependence analysis and vectorization are well within their grasp. Students complete that course at a rate of about 15/year. >Cost of compiler development can be significant. Sometimes a company might >put a hardware feature in even though they know a compiler approach would be >better, because they can't afford the compiler guys (or the compiler guys >already have exclusive contracts with a competitor). In my view, this is 99% false. Companies tend to put the money into hardware because it is more concrete and they also are used to putting money into hardware. For example, one of my former students works at Motorola as a compiler person -- but he's one of a very few compared to *MANY* architecture/hardware folk. In fact, he also has an architecture background and without it he probably wouldn't have been given the job. Companies have to learn that creating a compiler is comparably difficult to creating an architecture; the tendency is to give it less weight, resulting in overworked compiler people and delays in completing the compilers. A secondary issue is that designing one without deeply considering the other just plain doesn't work, and there are few people who are experts in BOTH compiler and architecture to act as the interface between the two groups. In contrast, consider a company like Burton Smith's Tera. Burton knows what he's doing -- he has tried very hard to make his company have a balance of compiler, architecture/hardware, and OS people. Did he have trouble getting these people? Perhaps a bit -- good OS folk are particularly hard to find in these "well, let's just port unix" times -- but generally I'd say he had less trouble than most companies would have because it is clear that he values these people at least as much as he values architecture/hardware types. >Let me list a few things to start off the discussion. I hope and expect to >be shot down on a few of them. It's easy to list a few of the hardware >enhancements that we already know compilers can take advantage of. Wrong viewpoint or, as a certain public figure used to say, "well, there you go again." You're trying to give architecture/hardware people a list of "you can feel safe doing this without consulting a compiler person" things -- the trick is to involve compiler people throughout rather than letting the machine be built and then calling in the compiler folk (and giving them H*ll because the compiler hasn't been able to achieve the machine's peak performance and wasn't delivered on time). For some years now, I've had a research group (about 2-3 faculty and 10-20 students) called CARP: Compiler-oriented Architecture Research at Purdue. A one paragraph version of our manifesto: "Research in compiler optimization/parallelization and hardware architecture is, and should be, tightly interwoven. CARP, the Compiler-oriented Architecture Research group at Purdue, centers on the innovative use of the interaction of compiler and architecture to increase system performance. In general, this is accomplished by blending STATIC (compile-time, assemble-time, or link-time) and DYNAMIC (runtime hardware, firmware, or operating system) analysis so that each computational atom is processed in the most efficient and reliable way. Statically, it is possible to understand/transform the entire program, yet only probabilistic knowledge is available (e.g., one can know branching probabilities, but not which way the branch goes this time). Dynamically, understanding/transformability is limited to a few instructions around the current program counter, but perfect knowledge within that range is common. Very few problems can be solved equally well using either kind of information -- the trick is simply to solve each problem in the right place." >Branch Delay Slots - small number >Branch Delay slots - large number >Register file - moderate sized (up to 32 registers) >Register file - large (around 128 registers, or more) >Separate floating point register file >Heterogenous register file >Instruction cache >Micro-scheduling parallelism (like CONVEX's ASAP) >Vectorizing >Multiple functional units - heterogenous - VLIW or superscalar >Multiple functional units - homogenous - VLIW or superscalar All old ideas with multiple viable approaches in the literature. This is not to say they are done perfectly, but that's not the issue in making a product.... Unfortunately, a few are not easy to automate in "generic" code generators (e.g., heterogeneous register file). >Multiple, hierarchical, registers sets >Data cache - software managed consistency >Parallelizing - fine grain, small numbers of processors >Parallelizing, fine grain, large numbers of processors. >Special hardware instructions - scalar These are problems with no readily available "cookbook" solutions. That doesn't necessarily mean they'd be hard for a compiler to deal with, just that it will take a bit of head scratching.... Of course, I still contend that the above list is headed the wrong way -- we should be looking for new ideas being synthesized by viewing both compiler and architecture/hardware. For example, the Barrier MIMD work (see papers in ICPP 90) could only have come from such a holistic view. -hankd@ecn.purdue.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.