Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!bbn!gatech!gitpyr!loligo!mccalpin From: mccalpin@loligo.uucp (John McCalpin) Newsgroups: comp.lang.fortran Subject: Re: Are vendors implementing BLAS? Message-ID: <7457@pyr.gatech.EDU> Date: 3 Mar 89 14:48:51 GMT References: <449@orange19.qtp.ufl.edu> <14500@admin.mips.COM> Sender: news@pyr.gatech.EDU Reply-To: mccalpin@loligo.cc.fsu.edu (John McCalpin) Organization: Supercomputer Computations Research Institute Lines: 28 In article <14500@admin.mips.COM> rogerk@mips.COM (Roger B.A. Klorese) writes: >In article <449@orange19.qtp.ufl.edu> bernhold@qtp.ufl.edu (David E. Bernholdt) writes: >>Is anyone out there aware of other vendors implementing the BLAS for >>their machines? I would expect that most presently available could get >>improved performance from special implementations of the BLAS (as >>opposed to just compiling the FORTRAN version). > >If special implementations could provide dramatically better performance >than a compiler, the compiler needs work. Our all-FORTRAN number is within >about 10% of our coded rate. Roger B.A. Klorese MIPS Computer Systems, Inc. On some vector machines, the best improvements can be obtained by inlining BLAS calls. This removes the subroutine call overhead and the check for non-unit stride (which is never used in LINPACK). On the Cyber 205, I got an immediate factor of 2 speedup on the order 100 LINPACK case by changing just the BLAS calls in SGEFA to in-line vector instructions. The compiler on the ETA-10 can do the in-lining now, but it is not so clever about removing the extraneous stride test (which can be evaluated at compile time). Many scalar machines show speedups of >20% with coded BLAS on the LINPACK test. I consider this level of improvement sufficient for me to want the coded BLAS -- though not sufficient for me to do it myself :-) ---------------------- John D. McCalpin ------------------------ Dept of Oceanography & Supercomputer Computations Research Institute mccalpin@masig1.ocean.fsu.edu mccalpin@nu.cs.fsu.edu --------------------------------------------------------------------