Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!necntc!auspyr!sci!kenm From: kenm@sci.UUCP Newsgroups: comp.arch,comp.sys.nsc.32k Subject: Re: NS32532 Patents Message-ID: <4042@sci.UUCP> Date: Thu, 16-Apr-87 14:58:36 EST Article-I.D.: sci.4042 Posted: Thu Apr 16 14:58:36 1987 Date-Received: Sat, 18-Apr-87 09:38:45 EST References: <4206@nsc.nsc.com> Organization: Silicon Compilers Inc., San Jose, Calif. Lines: 100 Xref: utgpu comp.arch:906 comp.sys.nsc.32k:77 Summary: been done before In article <4206@nsc.nsc.com>, roger@nsc.nsc.com (Roger Thompson) writes: > To stimulate some more valued discussions, let me lift some > of what is discussed in the 32532 overview brochure; > > " At least a dozen manufacturers have brought 32-bit solutions to the > marketplace. While each design is similar in the broad view, the specifics > of each implementation can vary greatly. And it is those specifics > that determine which is best for your needs. > > The specifics of the NS32532, however are unprecedented in 32-bit > microprocessor architectures. In fact National has applied for > eigth separate patents on the NS32532: Introduction: When I say "we" below I mean a group of CPU designers I was part of at HP for about 4 years (80-84). > > 1.) The method of detecting and handling memory-mapped I/O > by a pipelined microprocessor. ----- Think about > that for a while. The 32532 has a 1024 byte 2 way set > associative data cache. Without the special method > of handling I/O, writing I/O drivers is somewhat problematic. Not clear just what the problem is. Presumably the I/O addresses can identifiy themselves, so the cache just has to pay attention. > > 2.) Maintaining coherence between a microprocessors integrated cache > and the external memory. ------ Since both the Instruction > and Data caches are physical caches, we were able to devise > a means to provide "hardware" cache coherence hooks. Coherency > can be maintaind without cubersome software overhead and at > cost in performance. An extra tag set for the instruction cache so it can monitor all writes to the data cache. A simpler solution it to make it illegal architecturally to write into your own instruction stream and to provide a mechanism for flushing cache blocks. > > 3.) Monitoring control flow in a microprocessor ----- in other > words, branch prediction. We used a small special purpose cache for this. The way it worked was that the address of the conditional branch was hashed down to 9 bits which were used to index a 512x2 bit ram. The two bits were used to implement a "slow learner" state machine that predicted which way the branch would go. We saw a 95% prediction rate if programs were allowed to run long enough without a context switch. With context switch effects this dropped into the 80-85% rate for our test cases. Being a slow learner means that it only makes one mistake on the execution of a loop, on the very last pass. We also tried various 1,2, and 3 bit state machines but none of them worked as well. Credit for this goes to Mike Manlove at HP. There is also quite a bit of literature on the subject. > > 4.) The concept of a fully integrated cache, Memory Management Unit, > and Instruction pipeline. Pretty vague. I have heard lots of "concepts" in this area. > > 5.) Method of simultanous references to the cache and Bus Interface unit. Ditto. > > 6.) Method for completing instructions without waiting for writes. ---- > Yes thats right. Reads have priority over writes. Writes are > buffered in a 2 entry FIFO. There is one exception to this > rule ----- memory mapped I/O as in patent # 1 above. I remember reading about CDC machines back in the dark ages doing this. Essentially the output fifo contained both addresses and data and each read did a partial comparison (about 8 bits) of the read address against all the write addresses in the fifo and if a match was found then the data was grabbed out of the fifo and the writes had priority. Virtual addressing might complicate this if aliasing is allowed. > > 7.) Method of optimizing instruction fetches. Instruction buffers. Instruction caches. Fetching multiple paths simultaniously. Using branch prediction to fetch the probable path. Putting the instruction decoder on the other side of the instruction cache. (this takes the next address and branch target calculation out of the critical path) ... > > 8.) MMU that is accessible by the instruction unit, address unit > and the execution unit. If it wasn't, how would the processor work? > > These unique and innovative architectural refinements give the > NS32532 key performance advantages in a variety of 32-bit applications." > > > I'm open to discussion on any of these unique attributes. > > ------- Roger I'm going to be interested in how many of these National manages to patent. I'm also sure a lot of good engineering work went into the 32532 but most new ideas in this area aren't. Ken McElvain decwrl!sci!kenm