Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!ames!oliveb!sun!joe!petolino
From: petolino%joe@Sun.COM (Joe Petolino)
Newsgroups: comp.arch
Subject: Re: The 360 was a design landmark  (360 vs vax)
Message-ID: <26623@sun.uucp>
Date: Wed, 26-Aug-87 17:20:36 EDT
Article-I.D.: sun.26623
Posted: Wed Aug 26 17:20:36 1987
Date-Received: Sat, 29-Aug-87 03:59:10 EDT
References: <855@tjalk.cs.vu.nl> <2683@hoptoad.uucp> <916@haddock.ISC.COM> <418@astroatc.UUCP> <26444@sun.uucp> <422@astroatc.UUCP>
Sender: news@sun.uucp
Reply-To: petolino@sun.UUCP (Joe Petolino)
Organization: Sun Microsystems, Mountain View
Lines: 84

>The 8600 overlaps operand-decode with operand-fetch, and uses
>multiple functional (execution) units, but **UNLIKE** IBM and any
>other true pipe-line design, can *NOT* have multiple instructions
>in the decode phase simultaniously!  

This is certainly a novel criterion for calling a design 'pipelined'!
All of the CPU designs I know of (this includes machines by IBM, Amdahl,
MIPS, and Sun) have at most one instruction in each pipeline stage at any one
time.  This is almost by definition of the word pipeline - each instruction
flows from one stage to the next so that it can execute in parallel with
the instructions which are in the OTHER stages of the pipe.  Maybe the 
above poster is thinking of an instruction buffer which can hold several
already-fetched instructions waiting to go into the pipeline.  Maybe he's
thinking of some other form of parallelism altogether.

Anyway, so much for quibbling about names.  Here's a few cents worth of
my opinions on the 360 debate.

The 360 was certainly a landmark design for its time.  But times have
changed, and the 360 hasn't much (except maybe for the worse).  There's a
very good reason for this - a huge amount of non-portable software which runs
only on that architecture.  This is the reason that the 360/370 is still with
us: enough captive customers with enough money to make new implementations
profitable.  I don't think it's any inherent superiority of the architecture
that accounts for the high performance of the current top-of-the-line
incarnations - it's just that no one else has enough dollars worth of
customer base to justify the huge design effort that one of these beasts
requires.

I spent seven years designing caches for 370-compatibles, so I can give some
memory-related reasons why this architecture is difficult to implement:

* The architecture does not acknowledge the existence of caches.  There are
  no restrictions on storing into instruction words, no restrictions on
  virtual address mappings, no separation of code and data pages.  All these
  things conspire to make cache consistency a true headache.

* The normal instruction format specifies an operand address as the sum
  of two registers plus an offset.  This requires that three things be
  added together in the critical operand cache addressing path.

* Operand fetches must work on any alignment.  In addition to requiring
  shift networks in the data paths (not a big deal), this requires
  that the hardware be able to concatenate bytes from two different
  cache lines into a single operand.  Either of these two cache accesses
  may miss the cache or get an exception.

* There is no concept of an Address Space Identifier.  Instead, most 
  implementations use the address of the root of the translation tables,
  plus some control bits, to identify the Virtual Space that a virtual
  address belongs to.  This makes for some very long Tag words in TLBs
  and/or caches.

* Memory protection based on 'keys' which are attached to physical, not
  virtual, pages.  Since most cache implementations are virtually-addressed,
  finding and updating cached copies of these keys requires some
  sophisticated states machines which search through all entries of all
  caches and/or TLBs in the system.  The architecture requires that this
  be done by hardware.

* Several different translation table formats.  Virtual-to-physical
  translations are done in hardware, and the data paths needed to accomodate
  umpteen different operating systems' table formats is really messy.
  The older of these formats translates a 24-bit VA to a 24-bit PA.
  In a stroke of genious a few years ago, some new formats were introduced
  which expanded this to 31 (not 32) bits.
  
These are just a few of the things that I remember as being particularly
ill-suited to high-performance implementations.  Many of these are
characteristics of the 370, not the 360.  The last item is just a
special case of my biggest complaint about the 370: it's just too damn
complicated!  What started out as a reasonably clean and coherent 
architecture has been distorted by decades of added 'features' intended to
patch up the mismatch of old concepts to new technologies.

One final word about the ASCII vs EBCDIC debate.  You can enter ANY of the
128 ASCII codes from a standard ASCII keyboard.  I don't know of any
EBCDIC keyboard that can make a similar claim.  Part of the reason might
be that there is no agreed-upon standard for the graphic representation
of each character - seems to be more a matter of what's on the 'print chain'
at the time.  And part of the reason might be "We don't want just ANYONE
to enter THAT code!"

-Joe