Path: utzoo!utgpu!news-server.csri.toronto.edu!bonnie.concordia.ca!nstn.ns.ca!news.cs.indiana.edu!mips!zaphod.mps.ohio-state.edu!wuarchive!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.arch Subject: Re: Compilers & SPECmarks... Message-ID: Date: 10 Apr 91 12:47:55 GMT References: <32097@shamash.cdc.com> <1991Apr9.052607.12055@news.iastate.edu> <2525@urbana.mcd.mot.com> Sender: usenet@ee.udel.edu Organization: College of Marine Studies, U. Del. Lines: 32 Nntp-Posting-Host: perelandra.cms.udel.edu In-reply-to: dfields@radium.urbana.mcd.mot.com's message of 9 Apr 91 17:42:38 GMT > On 9 Apr 91 17:42:38 GMT, dfields@radium.urbana.mcd.mot.com > (David Fields) said: David> While I realize at least some of the limitations of the David> matrix300 test, I have gotten the impression the some limited David> number of real codes do have at least portions which are David> vectorizable (sic). If that's true then I don't see why any David> one who doesn't believe in the "One True Number" has a problem David> with it. The problem is not that the code is vectorizable, but that it consists entirely of one simple algorithm whose performance characteristics have been studied exhaustively over the years. For a matrix of order N, there are O(N^3) operations to be done on the N^2 data elements. The naive algorithm --- in fact, the algorithm *used by the code* --- requires O(N^3) loads, so that the cache is not useful and the code runs at main memory speed. Block-mode algorithms are available which reduce the number of loads to O(N^2). In this case, there is lots of operand re-use and main memory transfers are no longer the bottleneck. I believe that this switch in algorithms is essentially what the Kuck and Associates front-end to the Fortran compiler does with this test. There is, of course, nothing wrong with this, and it clearly speeds up the code. What is wrong is that there are *very few* algorithms which are as well understood as Gaussian elimination, so the probability that the compiler will be able to do something similar to your application is small --- especially since many applications do O(N^2) operations on O(N^2) data, so that no "block-mode" algorithms can exist. -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET