Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!ames!vsi1!wyse!mips!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Not-so RISCy Keywords: risc Message-ID: <13259@winchester.mips.COM> Date: 14 Feb 89 17:53:43 GMT References: <732@wpi.WPI.EDU> Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 69 In article <732@wpi.WPI.EDU> jhallen@wpi.wpi.edu (Joseph H Allen) writes: > >Reduction of instruction set size/complexity is the main area of design which >enhances speed in RISC processors. Another area which I'm wondering about is >data size handling. Modern RISC processors handle 8, 16, 32 and 64 bit words. >Some even handle data which crosses "word" bounderies (and on some (well one) >the byte order can be changed). The logic that must be dedicated to this must >be incedible, plus this logic is in the memory data path and therefore might a >speed constaint (especially if the data goes through the ALU before being >presented to the registers). Would it be a terrible hardship to only have two >data sizes (perhaps character and word) and not allow words to cross word >boundaries? Certainly it would require that people don't use "bad" >programming techniques similer to what has to be done on 68000 or IBM 360. >But would not the improvement in speed (by freeing up chip space to allow for >more registers or to simply reduce data path delay time) be worth it? 1) Automatic handling of unaligned data is indeed expensive, which is why RISC machines geernally omit it. 2) You certainly need word & character operations [to match the statistics of user programs.] If you have to materialize halfword ops, UNIX kernel code will suffer, for three reasons: a) There are many densely-encoded structures. Some of those might convert shorts to ints, but that doesn't do anything about: b) Networking code has 16-bit things all over the place, and you have NO CHOICE about the sizes, and c) When dealing with arbitrary devices, across things like VME buses, you'd better be able to generate indivisible 16-bit loads/stores, or your choice of peripheral controllers will be impacted. Some must be exactly 16-bits to match the semantics of the devices. Although MOST user programs don't use 16-bit quantities a lot, some do, a lot. 3) Once you have load word, load byte [signed|unsigned], and load half [signed|unsigned], all of which you really want to have, it doesn't take much more logic to do the unaligned operations (as separate instructions, NOT as an automatic thign that happens for unaligned operations). 4) Once you have all of that, it actually takes very little logic to do the byte-ordering swapping: in fact, what really happened was that the alignment network that shuffles bytes around anyway just got more complete. Oddly enough, I don't think it ended up taking any more silicon space, as the width was the same (32 bits), and the height was already forced by other constraints. 5) As usual, most of this has to be determined scientifically, by simulation of the impact of omitting the partial-word instructions. It is interesting that at least {HP, MIPS, Sun, Motorola} all came to the same conclusions on this (include the partial-word load/stores). In our case, we had some heritage of word+byte only (Stanford MIPS); I wouldn't put UNIX on a machine that didn't have 16-bit operations, even though many user-level statistics wouldn't justify their presence. 6) The unaligned load/store operations have proved absolutely invaluable. People maybe able to clean up their act on new code, but sometimes they have huge databases that have alignment problems. The unaligned operations turn out to be useful for C strings, COBOL+PL/1, and for porting large FORTRAN programs that have COMMON+EQUIVALENCE combinations that effectively prohibit "correct" alignment, especially if these came from the IBM or DEC worlds...which a few programs do. If you own a 2-million line CAD program, which you didn't write, and which contains code thru which the armies have marched thru the years, you do NOT want to be told that you must rework the program before you can get it to work the very first time. It's a lot easier to turn on a compiler switch that uses the unaligned instructions, typically losing 10-15% of performance, and either tune it later, or not bother at all, but at least get the application working.... Anyway, it's a good question: it's always good to question why features are included. In this case, there are good reasons. -- -john mashey DISCLAIMER: UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086