Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!husc6!think!ames!amdcad!bcase
From: bcase@amdcad.UUCP
Newsgroups: comp.arch
Subject: Re: Update to word addressing
Message-ID: <16164@amdcad.AMD.COM>
Date: Wed, 15-Apr-87 13:28:33 EST
Article-I.D.: amdcad.16164
Posted: Wed Apr 15 13:28:33 1987
Date-Received: Fri, 17-Apr-87 03:04:38 EST
References: <16163@amdcad.AMD.COM>
Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca.
Lines: 46
Summary: updated update

Ok, here is an update to the update.  This information is from our "main
man" circuit designer Dave Witt (the boss of the other circuit designer
whose comments I earlier recounted).  There isn't major dissagreement
here, just some clarification.

---------------------------
     Hi, brian

       Anyway, if what you are talking about is being able to mux
an arbitrary byte to/from a byte position to/from a byte positon via
load, my guess is that this would have an impact on performance
for us, but that this would be very architecturally and speed path
dependent.  On the 29000, because we do direct forwarding of loads,
the impact of allowing arbitrary multiplexing of a byte location to
any byte location of a register would cause an extra 1.5-2.0 nanoseconds
of setup on the delay from address to data valid.  (This is ignoring
the increased hardware associated with providing access to any byte
at each byte location in the datainput latch, and the selective byte
drive/tristate on our internal buses in the data input latch and the
register file).  This is because we use dynamic buses, which require
the data stable before the drive clock, and also that the picket [that
is a half clock cycle, ED] that we transfer the data-input latch to
the alu/shifter we currently use all that time for data transfer and
setting up the control signals for the funnel shift/alu/prioritizer.  
     I'll say that there is obviously increased complexity
associated with allowing byte loads,  that the effect of whether there
is a net effect on the performance of the processor is very dependent
on the internal pipe and associated internal architecture/speed paths,
and in the case of the 29000 if we were to have implemented this feature
it would have effected our address/data valid setup time.  This may
not be a major problem on other chips, but when
you are trying for 25-40mhz with associated external memory systems
and caches, then there is no more critical item to a processor
than giving the channel as much of the cycle time as possible.

                 David Witt
---------------------------------
Well, just thought that the net might find this interesting.  I guess
the thing to realize is that it is difficult to consider the effects
of a single feature separately.  If we had specified the alignment
network from the beginning, maybe our circuit guys would have found
a zero-time solution (they are pretty clever).  Over the phone, Dave
also worried that this circuitry might not scale, time wise, as well
as other stuff.  These are tough issues!

    bcase