Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!uunet!ogicse!cs.uoregon.edu!mips!mash From: mash@mips.com (John Mashey) Newsgroups: comp.arch Subject: Re: 64-bits, How many years? Message-ID: <660@spim.mips.COM> Date: 2 Mar 91 22:43:10 GMT References: <3209@crdos1.crd.ge.COM> <1991Feb27.000601.1508@batcomputer.tn.cornell.edu> <+MR97B7@xds13.ferranti.com> Sender: news@mips.COM Organization: MIPS Computer Systems, Inc. Lines: 106 Nntp-Posting-Host: winchester.mips.com In article <+MR97B7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >> Then why not design it for arbitrary length and just implement 48 bits >> for now? >How do you plan to allow for this in the instruction stream? What's the >word size? How big are the registers? How do you implement this? Actually, the question comes up fairly often, so why don't we just take care of it once and for all, i.e.: why 64-bits? why not 48 bits? The answer is simple: It's really AWKWARD to build a byte-oriented machine with word-sizes that are not power-of-two in terms of character sizes, especially if you'd like C to be reasonable on it. Let's assume that you get over the awkwardnesses of what you think ints, shorts, and such are. Let us also assume that you care whether or not C works on this machine (that is NOT a given, just an assumption; after all, 48-bit machines have been designed and sold, although not recently). However, consider this: memory is usually physically organized as words or multiples of words. Consider what happens when you do a load or store of (for example) 32-bits on a byte addressed machine: 1) Compute the address. T = tag, high-order bits I = index, middle bunch of bits xx = throw away low-order 2 bits T...TI...Ixx 2) Index the cache with I, check the resulting tags. OR, send T..TI..I to the memory system, with some extra specifier to do provide the access size and alignment. Now, consider the way byte-addressing works: T..TI..I00 T..TI..I01 T..TI..I10 T..TI..I11 (TI+1..)00 .... access the 4 bytes, in order, i.e., you can access the word, using a character pointer, if you keep incrementing it, you get the next byte of the next word, and all of this works perfectly fine. Now, suppose you have 48-bit words, with 8-bit chars? Consider word 0. It has bytes numbered from 0..5, or 000..101. Now, in what word is byte 6 (110)? Well, it's in word 1. So, how do you compute the index of the word that contains the byte, from the byte address? YOU DIVIDE BY 6, which is not a power of 2. Given the recent discussions of speed of division, it should be clear why computer designers do not wish to include a divide (that is NOT just a right-shift) in every partial word access.... So, maybe what you do is have 12-bit bytes, which at least gives you 4 of them ... but has other problems. Or, maybe, you punt on thinking there is a staightforward incrementation that maps words into bytes. This has been done. Many word-addressed machines, like DEC PDP-10s, and (gasp!) Stanford MIPS use word-addressing, but have special byte-pointers and instructions for dealing with them. (Note that MIPS Computer Systems MIPS use byte addresses - I wouldn't have come here if we'd stuck with word addresses :- life is too short.) Note that such things usually have a word address, then steal a couple bits give the byte number within word, and when you do p++ in one of these, the hardware increments the byte offset, and if it exceeds the maximum, it resets the offset to 0, and increments the word address. I.e., this sneaks around the division problem. It IS possible to port C to such things (as, for example, people at BTL did with Honeywell mainframes, Sperry/Unisys 1100s, XDS Sigma machines, etc), but it is never once been very pleasant. (Various BTL friends did ports to some fo t hese; the debriefing memos on the efforts were interesting. I felt especially sorry for the folks who did the string routines for the Univac 1100 series machines a long time ago. I don't know if it is still true, but at that time, the byte-within-word offset was actually stored as part of the opcode, i.e., you had things like "load 1st byte", "load 2nd byte", (different nomenclature, but that's the idea). Hence, and an efficient strcpy was at least 100s of lines long, as you had to decode the pointers to figure out which permutation of alignments was needed for the basic loop, i.e.: 1: load 1st byte, store 1st byte load 2nd byte, store 2nd byte... 2: load 1st byte, store 2nd byte load 2nd byte, store 3rd byte 3:.... Worse, there are now huge numbers of application programs that are very portable amongst power-of-two-byte-addressed machines, but which will be miserable to make run otherwise. This fact WAS NOT TRUE at the time people were doing C ports to such machines in the early 1970s, but it's true now. So... 48 is actually a pretty good number, as it is divisible by 2,3,4,6,8,12,16, and 24, but..... -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086