Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!think.com!spool.mu.edu!agate!darkstar!saturn.ucsc.edu!fritz
From: fritz@saturn.ucsc.edu (Frederick Staats)
Newsgroups: comp.arch
Subject: Re: N11 - i860 XP - some details, who knows more?
Keywords: i860 XP
Message-ID: <16837@darkstar.ucsc.edu>
Date: 8 Jun 91 20:06:59 GMT
Article-I.D.: darkstar.16837
References: <1991Jun7.193507.3733@beaver.cs.washington.edu>
Sender: usenet@darkstar.ucsc.edu
Organization: University of California, Santa Cruz
Lines: 43

In article <1991Jun7.193507.3733@beaver.cs.washington.edu> noah@cs.washington.edu writes:
>
>	I just heard a few things about the i860 XP (formerly known as
>the N11) from an Intel person.  

>Among the many complaints that have surfaced about the i860, a major one
>is the memory bandwidth being inadequate, especially for keeping the
>floating point pipeline fed.  Is this really going to help?  They have
>doubled the D-cache size, but 16K is still small.  What about this burst
>mode?  I am not familiar with it in the i860.  Is increasing its speed
>2.5 times going to make a difference in trying to get near peak floating
>point performance?
>
>						Rick N. Zucker

   There are two key differences in the i860 XP that should increase
performance of real machines built on the i860 family architecture.

	1) Updated caches (I-cache 4K --> 16K, D-cache 8K --> 16K):
	   These caches now have both virtual tags and physical tags
	   and use the MESI snooping protocol to support shared memory
	   multiprocessing and DMA.  Hooks for large external snooping
	   caches are also included.  The i860 XP appears to have
	   solved the major caching performance problems that made it
	   hard to use in multiprocessor computers.

	2) Quad word pipelined load/store. The Double word pipelined
	   load/store provided insufficient memory bandwidth for many
	   algorithms.  The pipelined load/store bypass the cache and
	   use the full memory bus bandwidth (burst mode) for maximum
	   performance on large external datasets.  The more than
	   doubling of the memory bandwidth was a requirement to
	   balance the FP with memory.

   Other minor changes (ie. registers for operating system support,
hardware for parallel loop execution on multiple cpus) are nifty
features but do not appear to significantly effect the performance
of the architecture.  The one thing I would like to see in the future
is more registers.  The current number is cramped and I am told make
it a real pain to write a good compiler for the architecture.

Frederick Staats		University of California, Santa Cruz
fritz@saturn.ucsc.edu		Supercomputer Research Group