Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!mips!zalman
From: zalman@mips.com (Zalman Stern)
Newsgroups: comp.arch
Subject: Re: endian etc
Keywords: endianness??, cache
Message-ID: <3407@spim.mips.COM>
Date: 11 May 91 11:13:29 GMT
References: <2496@cybaswan.UUCP>
Sender: news@mips.COM
Organization: MIPS Computer Systems, Sunnyvale, California
Lines: 61
Nntp-Posting-Host: dish.mips.com

In article <2496@cybaswan.UUCP> ex2mike@cybaswan.UUCP ( m overton) writes:
>
>A simple answer to all the problems with byte order etc, to the
>authors mind, is to have a duplicate set of load and store instructions.
>Since most RISC machines have very few of them, adding a set for the
>opposite sense would surely be very easy. It wouldn't work with 
>instructions, of course, but I assume that is not a problem. 
>

No, it doesn't help at all. Current RISC chips which support bi-endian
operation simply xor a constant with the low order address bits on non-word
operations. The constant changes depending on the byte-order bit. Storing a
word, changing the byte order bit, and loading the same word will get
exactly the same value (it will not be byte swapped).  Words have a
constant format in memory and byte addresses are modified appropriately.
Hence a buffer full of bytes cannot be accessed by processes of different
byte sex without word swapping the buffer. Moving the byte order bit into
the opcode accomplishes nothing. (Except wasting a lot of opcode space. See
below.)

A real solution would be to add byte lane swapping hardware to the chip.
This hardware would actually swap the bytes on every load and store
depending on the byte order bit. (it can stay in the status register, it
doesn't matter.) That way, memory is laid out so that bytes are always in
the same place and word operations swap them around as necessary. If this
were the case, buffers would not need to be byte swapped at all.

So why don't we do this? The word over here in software is that the
hardware is expensive in terms of space and time. Its very likely to end up
on the critical path for loads and stores. Any impact on cycle time to
support bi-endianess is a lose. (This makes sense to me, but I write code
for a living. The guys on the other side of the building are probably
falling out of their chairs laughing as they read this.)

Another point about architecture: I wouldn't say RISC machines have
relatively few load and store instructions, but either way, these
instructions tend to have large (~16 bit) immediate offsets. Every such
instruction will take one major opcode.  A quick glance at an opcode table
for MIPS-I indicates that putting the byte order bit into the opcode field
would add 17 new opcodes. There are 24 free opcodes in MIPS-I. In MIPS-II,
the 17 goes up and the 24 goes down such that there would not be enough
opcodes. (And believe me, the opcode space got used for something a hell of
a lot more useful than making the instruction set byte-wise bisexual.) Its
even worse for a machine like the RS/6000 which has update and indexed
variants of its load/store instructions.

>	Wouldn't it be very easy on a machine with a write back cache
>to copy words simply by changing the internal cached address ( a little
>like a form of cache aliasing). A lot of time is spent in most code
>just copying things around. Would this not improve things ( you gain
>immediately on cache occupancy).

This is one of many cache tricks you can play to make memory copies (bcopy)
and memory fills (bzero) go fast. Mostly these sorts of things are only
used inside the operating system because they aren't safe for unprivelleged
code to use. One can imagine hardware designed to provide this
functionality for user processes.
-- 
Zalman Stern, MIPS Computer Systems, 928 E. Arques 1-03, Sunnyvale, CA 94088
zalman@mips.com OR {ames,decwrl,prls,pyramid}!mips!zalman     (408) 524 8395
  "Never rub another man's rhubarb" -- the Joker via Pop Will Eat Itself