Xref: utzoo comp.arch:8815 comp.sys.intel:766 Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!cornell!uw-beaver!uw-june!robertb From: robertb@june.cs.washington.edu (Robert Bedichek) Newsgroups: comp.arch,comp.sys.intel Subject: Re: i860 Multiprocessing Keywords: i860, MC88000, cache, coherency, multiprocessor Message-ID: <7618@june.cs.washington.edu> Date: 17 Mar 89 05:22:35 GMT References: <494@ircam.UUCP> <3032@alliant.Alliant.COM> Reply-To: robertb@uw-june.UUCP (Robert Bedichek) Organization: U of Washington, Computer Science, Seattle Lines: 78 In article <3032@alliant.Alliant.COM> jeff@alliant.Alliant.COM (Jeff Collins) writes: > The performance issues associated with bus timing will be interesting >to know, but the i860 performance in a multiprocessor will also be effected by >the cache coherency schemes that are employed. If the part does not have >external bus watchers (I haven't heard this stated yet, only implied), then >it seems to be that the performance will be severely hampered. It depends on the multiprocessor's workload. If you build a Sequent-style machine and are doing, say parallel makes, often a useful thing to do, then there is little performance impact in maintaining cache/tlb coherency. The OS can be difficult to write, verify, and debug, of course, but this is the case with any general purpose multiprocessor OS. If the workload has a lot of shared data, you might be right. But you might also be surprised at the performance of a software-intensive solution on a machine like the MC88K, and perhaps the i860, where you can interrupt the CPU in just a few cycles. I've thought about this problem a little for 88K's, where one CPU can tell any other to do something specific, like flush a cache line or TLB. It takes a single store on the sending CPU, some interrupt control logic, and a hand-coded interrupt handler on the receiving CPU that can do the flush without saving more than one or two registers. The whole operation might take only 20 or 30 clock cycles. (On the 88K you don't have to resort to this because a CMMU can be flushed by its own CPU or by anything on the MBUS (memory bus), which all other CPU's have access to in most designs. Why I was thinking of this other scheme for the 88K is a longer story.) Btw, the 88K *does* have bus snooping, just as you would like and it might be faster *not* to use it. Bus snooping slows the system down because for every snoop, the CMMU must do a cache lookup. This will cause the CPU to stall sometimes when it goes to the CMMU (which is quite frequently). Build a dual-ported tag ram? Very expensive. There's not such thing as free cache coherency! > I may be missing something, but I have looked at a number of >microprocessors with an aim to putting them in a multiprocessor. The >conclusion that I, and my hardware friends, came to was that if a >microprocessor has an internal cache and no external invalidate logic, then >the only way to use the part in a symmetric multiprocessor is to disable the >internal data cache. Internal I-caches have the same problems when you start >to consider debuggers, but there are work arounds and performance isn't >critical in these cases. I think you are making assumptions about how quickly it could be done with a "RISC approach," see above. > What really confuses me is all of the activity aimed at putting these >parts in a multiprocessor. I admit that the part is amazingly fast, but is it >really an appropriate part for a multiprocessor - or even for a general >purpose processor? (I had to make some controversial statement :^) Oh it make a lot of sense! It's designed to be a graphics processor, where the problems often contain a high degree of parallelism that is relatively easily exploited. Many graphics processors are multiprocessors. Also, people like Sequent have shown that shared memory multiprocessors can work well in general computing environments. Why not use the fastest cheapest (overall) chip around to use as the element of a multiprocessor? (I'm not claiming that the i860 *is* these things, just that it is a possibility and therefore the discussion is reasonable.) Btw, on all this Dhrystone stuff on the i860 and on Intel in general: Listen to what they say about what speed part they will have when for what price, but don't even bother looking at their performance analysis of their CPU's. Read what MIPS has to say about their raw CPU performance, or measure it yourself, or flip a dime. Generally you have to guess/simulate/extrapolate performance for the system as a whole, so you have to do it yourself anyway. Rob Bedichek (robertb@cs.washington.edu) Disclaimer: I used to work for Intel, I think the MC 88K is great, the i860 might be really fine in many common applications, I've only read what MIPS has to say on the net.