Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!wuarchive!zaphod.mps.ohio-state.edu!magnus.acs.ohio-state.edu!csn!ncar!gatech!usenet.ins.cwru.edu!agate!stanford.edu!rutgers!modus!gear!cadlab!martelli
From: martelli@cadlab.sublink.ORG (Alex Martelli)
Newsgroups: comp.arch
Subject: Re: RISC vs. CISC -- SPECmarks
Message-ID: <820@cadlab.sublink.ORG>
Date: 5 May 91 10:08:56 GMT
References: <3423@charon.cwi.nl> <11602@mentor.cc.purdue.edu> 	<1991Apr30.163153.18568@midway.uchicago.edu> 	<1991May2.162909.9165@news.arc.nasa.gov> <MCCALPIN.91May3091530@pereland.cms.udel.edu>
Organization: CAD.LAB, Bologna, Italia
Lines: 18

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
	...
:theory, but all too often I find that the various machine's
:optimizers require *slightly* different code --- there is no one piece
:of code (even a nice block-mode version) that optimizes well on a
:broad range of scalar platforms....  Matrix multiply is a good example

Yes, I do agree with that - which speaks well for Dan Bernstein's idea
of having a language construct to say to the compiler: here are 2/3/N
different implementations of the SAME programming semantics, now please 
choose the one that's fastest on THIS machine!  This way we would still have
to do the hand-tweaking initially, but once ouur code performs well o, say,
half a dozen platforms, we stand a far better chance to be able to just 
compile and run fast on any new platform... and this holds not only for
numerical codes, but for much bread and butter stuff as well, e.g. an
explicit 'strcpy(a,b);' versus 'while(*a++=*b++);' where some machines and
compilers might be able to inline the call, and others might not, just to
give a trivial example.