Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!bonnie.concordia.ca!uunet!samsung!rex!uflorida!gatech!udel!nigel.ee.udel.edu!mccalpin From: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) Newsgroups: comp.arch Subject: Re: IBM RS6000 Message-ID: Date: 15 Jan 91 14:17:19 GMT References: <1991Jan10.214122.9506@news.arc.nasa.gov> <1991Jan14.055922.7546@zeno.mn.org> Sender: usenet@ee.udel.edu Organization: College of Marine Studies, U. Del. Lines: 28 Nntp-Posting-Host: perelandra.cms.udel.edu In-reply-to: gene@zeno.mn.org's message of 14 Jan 91 05:59:22 GMT >>>>> On 14 Jan 91 05:59:22 GMT, gene@zeno.mn.org (Gene H. Olson) said: Gene> lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) writes: [about the IBM RS/6000] >1) The machines are as fast as other micros on scalar code, and a lot > faster on vector code (other things being equal: clock speed, cache, > etc. etc). Many of the codes here *are* vectorizable. Gene> [....] I wrote a text compression program (compact) recently Gene> posted to comp.sources.misc. [....] However the data (default Gene> working space is about 1 meg) is accessed like a hash table, so Gene> any size cache is hit very hard by data accesses. [....] I Gene> found it ran dead even (+/- 10%) with a SparcStation 1 (not 1+) Gene> and a 25 MHz 486 machine with a good memory subsystem. The memory access pattern is the clue. The IBM RS/6000 architecture is clearly optimized for sequential access patterns. The cache line size on the Model 320 that Gene used is 64 bytes. The memory interface to cache delivers 8 bytes per clock with a latency of about 8 clocks, so each cache miss is going to cost you about 16 cycles. That is a fairly large penalty if you are going to only use one byte! Machines with smaller cache line sizes will retrieve a lot less unused information on each cache miss, and hence will run relatively more efficiently. -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET