Xref: utzoo comp.sys.m88k:447 comp.arch:18921 Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.sys.m88k,comp.arch Subject: Re: Fastest 88k Message-ID: <42612@mips.mips.COM> Date: 1 Nov 90 19:45:28 GMT References: <1172@iceman.jcu.oz> <42586@mips.mips.COM> <42593@mips.mips.COM> Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Followup-To: comp.sys.m88k Organization: MIPS Computer Systems, Inc. Lines: 40 This seemed worth reposting into comp.arch: the topic (is making your compiler better for SPEC benchmarks SPEC-specific are not) has come up now and then. Here's an additional opinion: In article tom@ssd.csd.harris.com (Tom Horsley) writes: >>>>>> Regarding Re: Fastest 88k; mash@mips.COM (John Mashey) adds: .... >mash> c) Do you feel that tuneups done to improve SPEC numbers carry over >mash> into improvements on other programs ... or not? .... >c) I would say that all the improvements we made are generally useful. We > look at a lot more benchmarks than just SPEC (some of them are rather > large real customer applications, or benchmarks derived from those > applications). We like to pick which optimizations to work on based on > cost/benefit analysis - if we don't see the need for something in a lot > of places, we generally don't work on it. > > Some of the SPEC benchmarks reacted fairly dramatically to some of our > optimizations, but the optimizations were not designed specifically to > get that reaction from SPEC. For example: the biggest single improvement > came from a combination of loop-unrolling, teaching the instruction > scheduler how to safely shuffle some loads past some stores (to keep the > data unit pipeline going), and teaching the register allocator to pick > registers in such a way as to allow the instruction scheduler maximum > flexibility (to keep the floating point pipeline going). All of this is > great stuff and is useful in almost any program. > > The SPEC matrix300 benchmark, however, spends 99.9% of its time in a > single matrix multiply-and-add loop. When the above set of optimizations > hit the matrix300 benchmark, the performance skyrocketed. This does not > mean our optimizations are not generally useful, but it does mean that > real programs which do actual work may not see a similar performance > boost (but they certainly should get better). Can anyone else add any more comments, or examples? -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086