Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ames!amdahl!chuck From: chuck@amdahl.uts.amdahl.com (Charles Simmons) Newsgroups: comp.arch Subject: Re: missing Dhrystone 2.1 (1 of 3 & Message-ID: <9a0K/cbluk1010IHSPc@amdahl.uts.amdahl.com> Date: 20 Jul 88 06:00:31 GMT References: <4232@cbmvax.UUCP> <76700035@p.cs.uiuc.edu> <1988Jul18.231331.19575@utzoo.uucp> <22406@amdcad.AMD.COM> <9amsvb52K11010cyawo@amdahl.uts.amdahl.com> Reply-To: chuck@amdahl.uts.amdahl.com (Charles Simmons) Organization: Amdahl Corporation, Sunnyvale CA Lines: 94 In article <9amsvb52K11010cyawo@amdahl.uts.amdahl.com> littauer@amdahl.uts.amdahl.com (Tom Littauer) writes: >In article <22406@amdcad.AMD.COM> tim@delirun.amd.com (Tim Olson) writes: >>In article <1988Jul18.231331.19575@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >>| In article <76700035@p.cs.uiuc.edu> gillies@p.cs.uiuc.edu writes: >>| >I certainly find it hard to believe that the top of the line Amdahl >>| >machine achieves 90,000+ Dhrystones... >>| > It nearly doubles the performance of the best Cray compiler >>| >reported (admittedly, compiling C for a Cray is probably hard, but >>| >Crays are very decent scaler machines! sheesh)... >>| >>| Ah, but Crays are *not* good character machines, and Dhrystone is known >>| to be excessively string-intensive. >> >>And the Amdahl machine referenced is a dual-processor model -- I assume >>that this was 45K Dhrystones per processor... > >No, 91K per each of 2 in a 5990-700 and 4 in a -1400... To be fair, that's >not the released compiler (it was GNU cc). Does GNU do Dhrystone >fakery? Anyway, the released compiler was 74K per each and it isn't >optimized for Dhrystone. Our processor guys bitch that Dhrystone doesn't >show our processors to be as fast as they really are, but that's an >entirely different discussion. Since there has been some discussion recently on how an Amdahl machine with the GNU cc compiler can achieve 91K dhrystones per processor, I thought that I'd go into a little bit of a discussion of just what the GNU cc compiler does that achieves this performance. I'll then let you all decide if GNU does "Dhrystone fakery". First, I like Henry Spencer's comment very much about Crays not being good character processing machines. Both the Cray and the Amdahl machines use very up-to-date technology, and there is no good reason for believing that a multi-million dollar Amdahl machine can't execute the Dhrystone benchmark faster than a multi-million dollar Cray. Cray machines are optimized for programs that require lots of memory (say 1 Gbyte or so), floating point computations, and vector computations. Amdahl machines are optimized for smaller amounts of memory (say 256 Mbytes), and scalar processing that is non-floating-point intensive. (These are my personal beliefs until I'm corrected by someone who knows more.) The GNU compiler achieves extremely good Dhrystone results as compared to the current pcc based compiler primarily through two mechanisms. First, the GNU compiler performs reasonably good register allocation. Since the Dhrystones spend much of their time in relatively short routines that use relatively few registers, the GNU compiler can frequently keep all of the values that are needed for a routine in registers that do not need to be saved on the stack. In addition, I have a special case optimization that I perform so that when a subroutine does not call any other subroutine, much of the procedure set up code is optimized away (e.g. I don't allocate a stack frame if I don't need one). Thus, for Dhrystones, GCC performs good register allocation, and it generates code that keeps the overhead of calling a subroutine to a minimum. We do not use in-line subroutines (I tried, but GCC generated incorrect code), nor do we do anything remotely resembling link-time register allocation. (Rest assured that if I do figure out ways to do things like this, I will report this type of optimization with any results that I publish.) I would only make the following conclusions from these GCC results: 1) An Amdahl mainframe is lots faster than a Vax 750. 2) For some applications, an Amdahl mainframe may outperform a Cray. 3) The GNU C compiler does a fairly good job of register allocation (especially in small routines that use single word registers). 4) The GNU C compiler is easily modified to make special case optimizations (such as allocating stack frames only when they are needed). In summary, I would not be offended if anyone decided that the 91K figure that I've published were considered a research result, but not something that could be realistically attained using the production compiler that we supply. The 91K figure should be viewed as a theoretical upper bound (a goal to shoot for), and an indication of the types of performance levels that can be achieved in hand-coded assembler. Still, 74K Dhrystones per processor head isn't too shabby. (To follow up on the last comment of Tom's... To really benchmark an Amdahl mainframe against other types of machines, we would prefer a benchmark that used over 100Mbytes of memory and which did lots of I/O. It would be fun to publish numbers that show a 68020 based machine thrashing on the benchmark for many hours (days?) while the mainframe completes the job in a couple minutes.) -- Chuck