Path: utzoo!attcan!uunet!aplcen!uakari.primate.wisc.edu!ames!ncar!mephisto!udel!princeton!idunno!taylor!ssr From: ssr@taylor.Princeton.EDU (Steve S. Roy) Newsgroups: comp.arch Subject: Re: i860 registers, follow up Summary: Comments on the i860 Keywords: i860, registers, compiler Message-ID: <910@idunno.Princeton.EDU> Date: 15 Jun 90 18:21:49 GMT References: <501@tau.megatek.uucp> Sender: news@idunno.Princeton.EDU Reply-To: ssr@acm.Princeton.EDU (Steve S. Roy) Organization: Princeton University Lines: 73 In article <501@tau.megatek.uucp> rstewart@megatek.UUCP (Rich Stewart) writes: >I guess I should add a bit more info to my previous posting. >I am interested in the chip from the point of view of >the best software perfomance I can get in critical routines. >I still have not found a compiler that does a good job of this so, >this is why I feel separating the register sets leads to slower >software: > >On the i860 you have to do an integer to float register move in order >to do an integer multiply, and then you have to move it back to do >any other integer ops on the result. (Integer multiplies take place in >the fpu) > >Also the floating point stores and loads can work on 64 bit aligned >words in the same amount of time as 32 bits, but you have got to >move all of those integer results back into the floating point registers >to take advantage of this. Well, I think that separating the integer and floating point registers isn't what makes integer multiplies slow, and it isn't exactly what makes the i860 difficult to write a compiler for. The main reason there are different integer and floating point registers is because the integer and floating point units are very separate on this chip. They run almost independantly, and one can freeze and let the other continue. You can have them run in complete parallel. This is a major source of the speed of the chip, since you can overlap reading or writing to memory with computation. There's no real intrinsic reason they couldn't have put in an integer multiply instruction, they just didn't feel it was worth the chip space do duplicate what was already in the floating point section. As I understand it, when people do instruction frequency analysis, they find that integer multiply isn't used that often, so a few extra clocks don't hurt too much. But given that you aren't going to have in integer-register to integer-register multiply it does make compler writing a bit tougher. And the fact that you cannot have a floating-register to floating-register truncate also hurts. But writing a compiler that just works and produces correct code is no more difficult on the i860 than on anything else, what is really difficult is to produce one that runs as fast as possible. After all, part of the point of this chip is that it's supposed to be really fast. Peak speeds of 60 double precision MFLOPS and 80 single precision MFLOPS is approaching Cray 1 speeds. I've written code that does that. What makes it difficult to write a compiler for this chip that actually gets that sort of speed is: 1: The processor is faster than standard memory, the on-chip cache is microscopic, and there are no real provisions for an off chip cache. 2: The fast multiplies and adds are pipelined, meaning that you can have several going at once. Current compiler technology doesn't seem to know how to deal with that. There are some isolated groups that do but their knowlege hasn't diffused out. 3: The multiply-accumulate instructions are arcane and don't even begin to think about being orthogonal. 4: There are only 15 double precision registers. That may sound like a lot to standard microprocessor folks, but with the cache, pipeling, and non-orthogonality stuff you need more than that. I don't think it's impossible to write a compiler that gets a significant fraction of the peak speed of this machine, but it is difficult. As a matter of fact it's more difficult than it had to be. It seems like the people designing the chip never talked to compiler writers because there are several things that spuriously make it difficult to write compilers. Steve Roy