Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!husc6!cmcl2!rutgers!ames!oliveb!sun!joe!petolino From: petolino%joe@Sun.COM (Joe Petolino) Newsgroups: comp.arch Subject: Re: The 360 was a design landmark (360 vs vax) Message-ID: <26623@sun.uucp> Date: Wed, 26-Aug-87 17:20:36 EDT Article-I.D.: sun.26623 Posted: Wed Aug 26 17:20:36 1987 Date-Received: Sat, 29-Aug-87 03:59:10 EDT References: <855@tjalk.cs.vu.nl> <2683@hoptoad.uucp> <916@haddock.ISC.COM> <418@astroatc.UUCP> <26444@sun.uucp> <422@astroatc.UUCP> Sender: news@sun.uucp Reply-To: petolino@sun.UUCP (Joe Petolino) Organization: Sun Microsystems, Mountain View Lines: 84 >The 8600 overlaps operand-decode with operand-fetch, and uses >multiple functional (execution) units, but **UNLIKE** IBM and any >other true pipe-line design, can *NOT* have multiple instructions >in the decode phase simultaniously! This is certainly a novel criterion for calling a design 'pipelined'! All of the CPU designs I know of (this includes machines by IBM, Amdahl, MIPS, and Sun) have at most one instruction in each pipeline stage at any one time. This is almost by definition of the word pipeline - each instruction flows from one stage to the next so that it can execute in parallel with the instructions which are in the OTHER stages of the pipe. Maybe the above poster is thinking of an instruction buffer which can hold several already-fetched instructions waiting to go into the pipeline. Maybe he's thinking of some other form of parallelism altogether. Anyway, so much for quibbling about names. Here's a few cents worth of my opinions on the 360 debate. The 360 was certainly a landmark design for its time. But times have changed, and the 360 hasn't much (except maybe for the worse). There's a very good reason for this - a huge amount of non-portable software which runs only on that architecture. This is the reason that the 360/370 is still with us: enough captive customers with enough money to make new implementations profitable. I don't think it's any inherent superiority of the architecture that accounts for the high performance of the current top-of-the-line incarnations - it's just that no one else has enough dollars worth of customer base to justify the huge design effort that one of these beasts requires. I spent seven years designing caches for 370-compatibles, so I can give some memory-related reasons why this architecture is difficult to implement: * The architecture does not acknowledge the existence of caches. There are no restrictions on storing into instruction words, no restrictions on virtual address mappings, no separation of code and data pages. All these things conspire to make cache consistency a true headache. * The normal instruction format specifies an operand address as the sum of two registers plus an offset. This requires that three things be added together in the critical operand cache addressing path. * Operand fetches must work on any alignment. In addition to requiring shift networks in the data paths (not a big deal), this requires that the hardware be able to concatenate bytes from two different cache lines into a single operand. Either of these two cache accesses may miss the cache or get an exception. * There is no concept of an Address Space Identifier. Instead, most implementations use the address of the root of the translation tables, plus some control bits, to identify the Virtual Space that a virtual address belongs to. This makes for some very long Tag words in TLBs and/or caches. * Memory protection based on 'keys' which are attached to physical, not virtual, pages. Since most cache implementations are virtually-addressed, finding and updating cached copies of these keys requires some sophisticated states machines which search through all entries of all caches and/or TLBs in the system. The architecture requires that this be done by hardware. * Several different translation table formats. Virtual-to-physical translations are done in hardware, and the data paths needed to accomodate umpteen different operating systems' table formats is really messy. The older of these formats translates a 24-bit VA to a 24-bit PA. In a stroke of genious a few years ago, some new formats were introduced which expanded this to 31 (not 32) bits. These are just a few of the things that I remember as being particularly ill-suited to high-performance implementations. Many of these are characteristics of the 370, not the 360. The last item is just a special case of my biggest complaint about the 370: it's just too damn complicated! What started out as a reasonably clean and coherent architecture has been distorted by decades of added 'features' intended to patch up the mismatch of old concepts to new technologies. One final word about the ASCII vs EBCDIC debate. You can enter ANY of the 128 ASCII codes from a standard ASCII keyboard. I don't know of any EBCDIC keyboard that can make a similar claim. Part of the reason might be that there is no agreed-upon standard for the graphic representation of each character - seems to be more a matter of what's on the 'print chain' at the time. And part of the reason might be "We don't want just ANYONE to enter THAT code!" -Joe