Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!husc6!think!ames!amdcad!bcase From: bcase@amdcad.UUCP Newsgroups: comp.arch Subject: Re: Update to word addressing Message-ID: <16164@amdcad.AMD.COM> Date: Wed, 15-Apr-87 13:28:33 EST Article-I.D.: amdcad.16164 Posted: Wed Apr 15 13:28:33 1987 Date-Received: Fri, 17-Apr-87 03:04:38 EST References: <16163@amdcad.AMD.COM> Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca. Lines: 46 Summary: updated update Ok, here is an update to the update. This information is from our "main man" circuit designer Dave Witt (the boss of the other circuit designer whose comments I earlier recounted). There isn't major dissagreement here, just some clarification. --------------------------- Hi, brian Anyway, if what you are talking about is being able to mux an arbitrary byte to/from a byte position to/from a byte positon via load, my guess is that this would have an impact on performance for us, but that this would be very architecturally and speed path dependent. On the 29000, because we do direct forwarding of loads, the impact of allowing arbitrary multiplexing of a byte location to any byte location of a register would cause an extra 1.5-2.0 nanoseconds of setup on the delay from address to data valid. (This is ignoring the increased hardware associated with providing access to any byte at each byte location in the datainput latch, and the selective byte drive/tristate on our internal buses in the data input latch and the register file). This is because we use dynamic buses, which require the data stable before the drive clock, and also that the picket [that is a half clock cycle, ED] that we transfer the data-input latch to the alu/shifter we currently use all that time for data transfer and setting up the control signals for the funnel shift/alu/prioritizer. I'll say that there is obviously increased complexity associated with allowing byte loads, that the effect of whether there is a net effect on the performance of the processor is very dependent on the internal pipe and associated internal architecture/speed paths, and in the case of the 29000 if we were to have implemented this feature it would have effected our address/data valid setup time. This may not be a major problem on other chips, but when you are trying for 25-40mhz with associated external memory systems and caches, then there is no more critical item to a processor than giving the channel as much of the cycle time as possible. David Witt --------------------------------- Well, just thought that the net might find this interesting. I guess the thing to realize is that it is difficult to consider the effects of a single feature separately. If we had specified the alignment network from the beginning, maybe our circuit guys would have found a zero-time solution (they are pretty clever). Over the phone, Dave also worried that this circuitry might not scale, time wise, as well as other stuff. These are tough issues! bcase