Path: utzoo!censor!geac!torsqnt!news-server.csri.toronto.edu!cs.utexas.edu!uunet!ogicse!cs.uoregon.edu!mips!mash
From: mash@mips.com (John Mashey)
Newsgroups: comp.arch
Subject: Re: 64-bits, How many years?
Message-ID: <660@spim.mips.COM>
Date: 2 Mar 91 22:43:10 GMT
References: <3209@crdos1.crd.ge.COM> <1991Feb27.000601.1508@batcomputer.tn.cornell.edu> <+MR97B7@xds13.ferranti.com>
Sender: news@mips.COM
Organization: MIPS Computer Systems, Inc.
Lines: 106
Nntp-Posting-Host: winchester.mips.com

In article <+MR97B7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>> Then why not design it for arbitrary length and just implement 48 bits
>> for now?
>How do you plan to allow for this in the instruction stream? What's the
>word size? How big are the registers? How do you implement this?

Actually, the question comes up fairly often, so why don't we just
take care of it once and for all, i.e.:
	why 64-bits?  why not 48 bits?

The answer is simple:
	It's really AWKWARD to build a byte-oriented machine with
	word-sizes that are not power-of-two in terms of character sizes,
	especially if you'd like C to be reasonable on it.

Let's assume that you get over the awkwardnesses of what you think
ints, shorts, and such are.  Let us also assume that you care whether
or not C works on this machine (that is NOT a given, just an assumption;
after all, 48-bit machines have been designed and sold, although
not recently).

However, consider this: memory is usually physically organized
as words or multiples of words.  Consider what happens when you do
a load or store of (for example) 32-bits on a byte addressed machine:
1) Compute the address.
	T = tag, high-order bits
	I = index, middle bunch of bits
	xx = throw away low-order 2 bits
	T...TI...Ixx
2) Index the cache with I, check the resulting tags.
OR, send T..TI..I to the memory system, with some extra specifier to do
provide the access size and alignment.

Now, consider the way byte-addressing works:
	T..TI..I00
	T..TI..I01
	T..TI..I10
	T..TI..I11
	(TI+1..)00
	....
access the 4 bytes, in order, i.e., you can access the word, using
a character pointer, if you keep incrementing it, you get the
next byte of the next word, and all of this works perfectly fine.

Now, suppose you have 48-bit words, with 8-bit chars?
Consider word 0.  It has bytes numbered from 0..5, or 000..101.
Now, in what word is byte 6 (110)?
Well, it's in word 1.
So, how do you compute the index of the word that contains the byte,
from the byte address?
	YOU DIVIDE BY 6, which is not a power of 2.
	Given the recent discussions of speed of division, it should be
	clear why computer designers do not wish to include a divide
	(that is NOT just a right-shift) in every partial word access....

So, maybe what you do is have 12-bit bytes, which at least gives you
4 of them ...  but has other problems.

Or, maybe, you punt on thinking there is a staightforward incrementation
that maps words into bytes.
	This has been done.  Many word-addressed machines, like DEC
	PDP-10s, and (gasp!) Stanford MIPS use word-addressing,
	but have special byte-pointers and instructions for dealing
	with them. (Note that MIPS Computer Systems MIPS use byte
	addresses - I wouldn't have come here if we'd stuck with word
	addresses :- life is too short.)
	Note that such things usually have a word address, then steal
	a couple bits give the byte number within word, and when you
	do p++ in one of these, the hardware increments the byte offset,
	and if it exceeds the maximum, it resets the offset to 0, and
	increments the word address.  I.e., this sneaks around the
	division problem.

	It IS possible to port C to such things (as, for example,
	people at BTL did with Honeywell mainframes, Sperry/Unisys 1100s,
	XDS Sigma machines, etc), but it is never once been very pleasant.
	(Various BTL friends did ports to some fo t hese; the debriefing
	memos on the efforts were interesting.  I felt especially sorry
	for the folks who did the string routines for the Univac 1100
	series machines a long time ago.  I don't know if it is still
	true, but at that time, the byte-within-word offset was actually
	stored as part of the opcode, i.e., you had things like
	"load 1st byte", "load 2nd byte", (different nomenclature,
	but that's the idea).  Hence, and an efficient strcpy was at least
	100s of lines long, as you had to decode the pointers to
	figure out which permutation of alignments was needed for
	the basic loop, i.e.:
		1: load 1st byte, store 1st byte
		   load 2nd byte, store 2nd byte...
		2: load 1st byte, store 2nd byte
		   load 2nd byte, store 3rd byte
	  	3:....
		   
	Worse, there are now huge numbers of application programs that
	are very portable amongst power-of-two-byte-addressed machines,
	but which will be miserable to make run otherwise.  This fact
	WAS NOT TRUE at the time people were doing C ports to such
	machines in the early 1970s, but it's true now.

So...  48 is actually a pretty good number, as it is divisible by
2,3,4,6,8,12,16, and 24, but.....
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086