Path: utzoo!attcan!uunet!lll-winken!gauss.llnl.gov!casey From: casey@gauss.llnl.gov (Casey Leedom) Newsgroups: comp.arch Subject: Re: RISC vs CISC simple load benchmark; amazing ! [Not really] Message-ID: <61780@lll-winken.LLNL.GOV> Date: 15 Jun 90 19:20:32 GMT References: <8019@mirsa.inria.fr> <39319@mips.mips.COM> <675@sibyl.eleceng.ua.OZ> <39397@mips.mips.COM> Sender: usenet@lll-winken.LLNL.GOV Reply-To: casey@gauss.llnl.gov (Casey Leedom) Organization: Lawrence Livermore National Laboratory Lines: 36 | From: mash@mips.COM (John Mashey) | | The worst case performance is not all that interesting: for two cached | machines with different cache organization, you can usually "prove" | different ratios of relative performance by careful selection of the | most relevant cache-busting code. | | [A good example] on a direct-mapped, virtual cache machine, is | to copy, 1 byte at a time, between two areas that collide in the cache. | | (i.e., if you want to artificially show off a SPARC 490 at its worst, | you can probably prove its slower than a 68020 with such a benchmark). | Of course, any given machine can be done in this way. While I agree with you that one can always come up with cache-busting code, I think that you picked a particularly bad example as I think that the cache design in question is just brain dead. If you design a direct mapped cache you should have at least two buckets for each cache line. Linearly doing things to two different arrays is so common that you're bound to run into the problem you mention. (As proof, I was called in on a problem with a Sun 3/280 that was bought for image processing. Part of their processing involved, essentially, copying a 1/4Mb array 30 times a second. The group had justified buying the 280 on the grounds that a 180 just wouldn't be fast enough. Imagine their horror when they ran their code on their brand new 280 and found that it ran 3 times slower than on a 180! The problem turned out to be that the two arrays were an exact multiple of 64Kb away from each other -- the size of the 280's cache. Eventually I was able to bring the 280 up to the speed of a 180 by offsetting the arrays by 24 bytes from 64Kb. (There were actually a bunch of fast nodal points, but you get the idea.)) A cache shouldn't break on common operations. Casey