Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!zaphod.mps.ohio-state.edu!sdd.hp.com!elroy.jpl.nasa.gov!ames!sgi!shinobu!odin!pipo.corp.sgi.com!jpp From: jpp@pipo.corp.sgi.com (Jean-Pierre Panziera) Newsgroups: comp.sys.sgi Subject: Re: Basic Linear Algebra Subroutines (BLAS) Message-ID: <10365@odin.corp.sgi.com> Date: 14 Jul 90 00:05:56 GMT References: <90Jul13.100737edt.8304@ephemeral.ai.toronto.edu> Sender: news@odin.corp.sgi.com Reply-To: jpp@corp.sgi.com Organization: Silicon Graphics, Applications Product Division Lines: 33 In article <90Jul13.100737edt.8304@ephemeral.ai.toronto.edu>, tff@na.toronto.edu (Tom Fairgrieve) writes: > From: tff@na.toronto.edu (Tom Fairgrieve) > Subject: Basic Linear Algebra Subroutines (BLAS) > Date: 13 Jul 90 14:08:02 GMT > Organization: Department of Computer Science, University of Toronto > > Does SGI have an optimized version of the BLAS (Basic Linear Algebra > Subroutines) available for the 4d/240? If so, how does the performance > of this version compare to a version produced by the f77 compiler with > -O3 optimization level set? I'm interested in all 3 levels of the BLAS. > > Thanks for any information, > Tom Fairgrieve > tff@na.utoronto.ca As far as I know SGI does not have an official version of BLAS3, I may be wrong. However I have optimized/parallelized a Fortran version of the matrix multiplication routines of Blas3 I get pretty good results on a 220-GTX : dgemm 5-11 Mflops zgemm 10-14 Mflops sgemm 10-16 Mflops cgemm 12-17 Mflops the lowest performances are for A * trans(B), the highest for trans(A) * B I am sure it can be improved and I do not warranty it is bug free.