Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!apple!amdcad!mozart.amd.com!nucleus!davec From: davec@nucleus.amd.com (Dave Christie) Newsgroups: comp.arch Subject: Re: Is handling off-alignment important? Message-ID: <1990Jul25.223437.15301@mozart.amd.com> Date: 25 Jul 90 22:34:37 GMT References: <104037@convex.convex.com> <8840016@hpfcso.HP.COM> <2370@crdos1.crd.ge.COM> Sender: usenet@mozart.amd.com (Usenet News) Reply-To: davec@nucleus.amd.com (Dave Christie) Organization: Advanced Micro Devices, Inc., Austin, Texas Lines: 81 In article <2370@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: > > Alternatively the hardware can support unalligned fetch. It doesn't >have to be efficient, because you would have to make an effort to make >the fetch logic slower than software, it just has to work. This makes >the program a bit smaller, and assuming that the chip logic is right, it >prevents everyone from implementing their own try at access code. . . > Note that this is not a RISC issue, in that the bus interface unit >already may be doing things like cache interface, multiplexing lines, >controlling status lines, etc. The BIU is not really RISC in that sense, >it functions like a coprocessor if you draw a logic diagram, who's >function is to provide data, which can go in the pipeline or into the >CPU. Note that there are two degrees of misalignment: 1) within a word, and 2) crossing a word (& possible page) boundary. For 1): If the realignment hardware is not in your main fetch path because it would impact your cycle time, then it will likely mean an extra stage of processing for instructions which use it, which can add various bits of complexity. Considering that, plus 1) a 4-way mux isn't a serious time sink, and 2) how much, or even whether, it influences the cycle time is technology and implementation dependent then you are likely just going to stick it in the main fetch path and do it efficiently, w.r.t. layout, etc. Now, if the end user does pay for this, it isn't likely going to be in performance, because even though it might influence the cycle time, it won't. Chips come in "standard" operating frequencies these days (e.g. 16,20,25,30,40,50); The difference that a 4-way mux might make would tend to be taken care of by the process tweaking that's done to get to the desired frequency. In this case, the realignment hardware influences yield rather than cycle time, hence cost rather than the performance. I can't think of any processor that doesn't support this degree of realignment (some better than others). For 2): This, IMHO, is one of the more significant things that differentiates "RISC" from "CISC". The notion of one instruction making multiple references to memory tends to make RISC designers get red in the face and jump up and down. (Yes, I'm well aware of the 29K's load and store multiple instructions, and while I'm not fond of them, there are some significant differences between that and handling unaligned accesses.) The extra control complexity this introduces is a signficant increment, especially considering all the nightmarish endcases that have already been described in this thread. The added complexity is dependent on architecture and implementation, and tends to be worse for stores, but at any rate it tends to increase design/debug time, and more importantly can cause much hair pulling and resume writing when one attempts really high performance implementations. (I've know people who thrive on such complexity, for complexity's sake - they should be removed from the gene pool (0.5 :-). With the realestate one has to play with these days, you can find room for the complexity to keep the performance up, but it still influences the cost (and number of errata after release). I don't know of any "new" architecture chips with decent performance that support realignment across words in one instruction. Why do the common CISC chips support it? 1) it's not as big an increment in complexity (no smiley) 2) backwards compatibility (i.e. they have no choice) In summary, the cost you will tend to see will be $ more than performance, although at the high end of the performance spectrum you might pay in performance as well - that's hard to say, since processors which support word-crossing accesses tend to have a lot of other complexities which influence cost/performance as well. What makes sense depends on the intended applications, of course. It may indeed make some network software run significantly faster, for instance. But if that network software consumed 5% of all the cycles of all the processors I had sold, and such hardware support would *double* the n/w sfw performance, I still wouldn't risk screwing up everything else to go for an aggregate 2.5% performance improvement. ---------------------------- Dave Christie My humble opinions only.