Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!ames!necntc!auspyr!sci!kenm
From: kenm@sci.UUCP
Newsgroups: comp.arch,comp.sys.nsc.32k
Subject: Re: NS32532 Patents
Message-ID: <4042@sci.UUCP>
Date: Thu, 16-Apr-87 14:58:36 EST
Article-I.D.: sci.4042
Posted: Thu Apr 16 14:58:36 1987
Date-Received: Sat, 18-Apr-87 09:38:45 EST
References: <4206@nsc.nsc.com>
Organization: Silicon Compilers Inc., San Jose, Calif.
Lines: 100
Xref: utgpu comp.arch:906 comp.sys.nsc.32k:77
Summary: been done before

In article <4206@nsc.nsc.com>, roger@nsc.nsc.com (Roger Thompson) writes:
> To stimulate some more valued discussions, let me lift some
> of what is discussed in the 32532 overview brochure;
> 
> "  At least a dozen manufacturers have brought 32-bit solutions to the
> marketplace.  While each design is similar in the broad view, the specifics
> of each implementation can vary greatly.  And it is those specifics
> that determine which is best for your needs.
> 
>    The specifics of the NS32532, however are unprecedented in 32-bit
> microprocessor architectures.  In fact National has applied for 
> eigth separate patents on the NS32532:

Introduction:
When I say "we" below I mean a group of CPU designers I was part of
at HP for about 4 years (80-84).

> 
> 1.)  The method of detecting and handling memory-mapped I/O
>      by a pipelined microprocessor.  ----- Think about
>      that for a while.  The 32532 has a 1024 byte 2 way set
>      associative data cache.  Without the special method
>      of handling I/O, writing I/O drivers is somewhat problematic.
Not clear just what the problem is.  Presumably the I/O addresses
can identifiy themselves, so the cache just has to pay attention.
> 
> 2.)  Maintaining coherence between a microprocessors integrated cache
>      and the external memory.  ------ Since both the Instruction
>      and Data caches are physical caches, we were able to devise
>      a means to provide "hardware" cache coherence hooks.  Coherency
>      can be maintaind without cubersome software overhead and at
>      cost in performance.
An extra tag set for the instruction cache so it can monitor all writes
to the data cache.  A simpler solution it to make it illegal architecturally
to write into your own instruction stream and to provide a mechanism
for flushing cache blocks.
> 
> 3.)  Monitoring control flow in a microprocessor ----- in other 
>      words, branch prediction.

We used a small special purpose cache for this.  The way it worked
was that the address of the conditional branch was hashed down to 9 bits
which were used to index a 512x2 bit ram.  The two bits were used to
implement a "slow learner" state machine that predicted which way the
branch would go.  We saw a 95% prediction rate if programs were allowed
to run long enough without a context switch.  With context switch effects this
dropped into the 80-85% rate for our test cases.  Being a slow learner
means that it only makes one mistake on the execution of a loop,
on the very last pass.  We also tried various 1,2, and 3 bit state machines
but none of them worked as well.  Credit for this goes to Mike Manlove at
HP.  There is also quite a bit of literature on the subject.
> 
> 4.)  The concept of a fully integrated cache, Memory Management Unit,
>      and Instruction pipeline.
Pretty vague.  I have heard lots of "concepts" in this area.
> 
> 5.)  Method of simultanous references to the cache and Bus Interface unit.
Ditto.  
> 
> 6.)  Method for completing instructions without waiting for writes. ----
>      Yes thats right.  Reads have priority over writes.  Writes are
>      buffered in a 2 entry FIFO.  There is one exception to this
>      rule ----- memory mapped I/O as in patent # 1 above.

I remember reading about CDC machines back in the dark ages doing this.
Essentially the output fifo contained both addresses and data and
each read did a partial comparison (about 8 bits) of the read address
against all the write addresses in the fifo and if a match was found
then the data was grabbed out of the fifo and the writes had priority.
Virtual addressing might complicate this if aliasing is allowed.

> 
> 7.)  Method of optimizing instruction fetches.
Instruction buffers.
Instruction caches.
Fetching multiple paths simultaniously.
Using branch prediction to fetch the probable path.
Putting the instruction decoder on the other side of the instruction
	cache.  (this takes the next address and branch target calculation
        out of the critical path)
...
> 
> 8.)  MMU that is accessible by the instruction unit, address unit
>      and the execution unit.
If it wasn't, how would the processor work?
> 
>    These unique and innovative architectural refinements give the
> NS32532 key performance advantages in a variety of 32-bit applications."
> 
> 
> I'm open to discussion on any of these unique attributes.
> 
> ------- Roger 

I'm going to be interested in how many of these National manages to patent.
I'm also sure a lot of good engineering work went into the 32532 but most
new ideas in this area aren't.

Ken McElvain
decwrl!sci!kenm