Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!spool.mu.edu!agate!darkstar!saturn.ucsc.edu!fritz From: fritz@saturn.ucsc.edu (Frederick Staats) Newsgroups: comp.arch Subject: Re: N11 - i860 XP - some details, who knows more? Keywords: i860 XP Message-ID: <16837@darkstar.ucsc.edu> Date: 8 Jun 91 20:06:59 GMT Article-I.D.: darkstar.16837 References: <1991Jun7.193507.3733@beaver.cs.washington.edu> Sender: usenet@darkstar.ucsc.edu Organization: University of California, Santa Cruz Lines: 43 In article <1991Jun7.193507.3733@beaver.cs.washington.edu> noah@cs.washington.edu writes: > > I just heard a few things about the i860 XP (formerly known as >the N11) from an Intel person. >Among the many complaints that have surfaced about the i860, a major one >is the memory bandwidth being inadequate, especially for keeping the >floating point pipeline fed. Is this really going to help? They have >doubled the D-cache size, but 16K is still small. What about this burst >mode? I am not familiar with it in the i860. Is increasing its speed >2.5 times going to make a difference in trying to get near peak floating >point performance? > > Rick N. Zucker There are two key differences in the i860 XP that should increase performance of real machines built on the i860 family architecture. 1) Updated caches (I-cache 4K --> 16K, D-cache 8K --> 16K): These caches now have both virtual tags and physical tags and use the MESI snooping protocol to support shared memory multiprocessing and DMA. Hooks for large external snooping caches are also included. The i860 XP appears to have solved the major caching performance problems that made it hard to use in multiprocessor computers. 2) Quad word pipelined load/store. The Double word pipelined load/store provided insufficient memory bandwidth for many algorithms. The pipelined load/store bypass the cache and use the full memory bus bandwidth (burst mode) for maximum performance on large external datasets. The more than doubling of the memory bandwidth was a requirement to balance the FP with memory. Other minor changes (ie. registers for operating system support, hardware for parallel loop execution on multiple cpus) are nifty features but do not appear to significantly effect the performance of the architecture. The one thing I would like to see in the future is more registers. The current number is cramped and I am told make it a real pain to write a good compiler for the architecture. Frederick Staats University of California, Santa Cruz fritz@saturn.ucsc.edu Supercomputer Research Group