Xref: utzoo comp.sys.m88k:447 comp.arch:18921
Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!mips!winchester!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.sys.m88k,comp.arch
Subject: Re: Fastest 88k
Message-ID: <42612@mips.mips.COM>
Date: 1 Nov 90 19:45:28 GMT
References: <1172@iceman.jcu.oz> <42586@mips.mips.COM> <TOM.90Oct31160947@hcx2.ssd.csd.harris.com> <42593@mips.mips.COM> <TOM.90Nov1072249@hcx2.ssd.csd.harris.com>
Sender: news@mips.COM
Reply-To: mash@mips.COM (John Mashey)
Followup-To: comp.sys.m88k
Organization: MIPS Computer Systems, Inc.
Lines: 40

This seemed worth reposting into comp.arch: the topic (is making your
compiler better for SPEC benchmarks SPEC-specific are not) has come up
now and then.  Here's an additional opinion:

In article <TOM.90Nov1072249@hcx2.ssd.csd.harris.com> tom@ssd.csd.harris.com (Tom Horsley) writes:
>>>>>> Regarding Re: Fastest 88k; mash@mips.COM (John Mashey) adds:
....
>mash> 	c) Do you feel that tuneups done to improve SPEC numbers carry over
>mash> 	into improvements on other programs ... or not?
....
>c) I would say that all the improvements we made are generally useful.  We
>   look at a lot more benchmarks than just SPEC (some of them are rather
>   large real customer applications, or benchmarks derived from those
>   applications). We like to pick which optimizations to work on based on
>   cost/benefit analysis - if we don't see the need for something in a lot
>   of places, we generally don't work on it.
>
>   Some of the SPEC benchmarks reacted fairly dramatically to some of our
>   optimizations, but the optimizations were not designed specifically to
>   get that reaction from SPEC. For example: the biggest single improvement
>   came from a combination of loop-unrolling, teaching the instruction
>   scheduler how to safely shuffle some loads past some stores (to keep the
>   data unit pipeline going), and teaching the register allocator to pick
>   registers in such a way as to allow the instruction scheduler maximum
>   flexibility (to keep the floating point pipeline going). All of this is
>   great stuff and is useful in almost any program.
>
>   The SPEC matrix300 benchmark, however, spends 99.9% of its time in a
>   single matrix multiply-and-add loop. When the above set of optimizations
>   hit the matrix300 benchmark, the performance skyrocketed. This does not
>   mean our optimizations are not generally useful, but it does mean that
>   real programs which do actual work may not see a similar performance
>   boost (but they certainly should get better).

Can anyone else add any more comments, or examples?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086