Path: utzoo!utgpu!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!nosc!ucsd!ucbvax!hplabs!hpda!hpcupt1!hartman
From: hartman@hpcupt1.HP.COM (Doug Hartman)
Newsgroups: comp.arch
Subject: Re: HP2100 (was: Re: Self Modifying Code)
Message-ID: <6310011@hpcupt1.HP.COM>
Date: 28 Jul 88 01:59:56 GMT
References: <33895@yale-celray.yale.UUCP>
Organization: Hewlett Packard, Cupertino
Lines: 89


Re: 2100 architecture.

While recent descendents of this machine have stacks, reentrant
code, virtual memory, etc., there are some interesting trivia notes
about this machine.

Most people used Fortran or assembly (the Algol compiler notwithstanding;
it has acquired Pascal and ADA (!?) by now...), so they didn't notice
the lack of a stack too much.  The Fortran compiler used to rely
on the fact that self modifying code was "cool".  For example, a
typical two word instruction to load a 32 bit floating point
number looked like so:

       octal address (in words)     instruction word
            2000                       DLD  (=105400 or something...)
            2001                       <fifteen bit address + indirect>

As a compiler, you have a couple of choices.  You can
compute the address of an array element in a register, then use
the fact that the A register is address zero and the B register is
address 1 to get      DLD A,I  in assembly.
Or, you can put the address in a variable:   STA PTR
                                             DLD PTR,I
If you want to re-use the result of the address computation.
But this takes one more memory cycle than if the instruction
contained the direct address.  The solution is obvious:  STA *+2
                                                         DLD <anything>
This puts the address to load right in the instruction stream.
Makes pipelining fun...not that the machine was pipelined.

Other trivia notes:  having only two registers tended to simplify the
procedure call convention and interrupt state save...:-).  The register
set was not symmetric--you could only use the A register for the
boolean operations.  No subtract, but you could XOR to anywhere on
the current page or page zero, possibly indirect, in one instruction...
The Fortran compiler didn't use the B register at all for the first
ten years of so.  Teaching it about B improved performance about 3%,
mostly because the compiler didn't save any register allocation 
information across statements.   To set X and Y to zero you
would get     CLA
              STA X
              CLA      <just to be sure...>
              STA Y
It would only get to B in certain "complex" expressions.

Later versions of the machine were heavily microcoded.  Things like
move words (which saved the count of words moved in the instruction
stream in order to restart after interrupt), compare bytes, load
from 32 bit virtual address, up through extreme CISC-isms such as
DPOLY, which computed the Chebychev polynomial expansion given a 
table of coefficents, a double precision vector instruction set
including "one instruction" pivot operations, and support for all of
the transcendantal instructions used by the Whetstone benchmark.
I wonder if anybody ever coded the hyperbolic tangent instruction
in assembly?

The machine held off interrupts for one cycle following a JMP indirect
to let you get out of the operating system "cleanly".  Of course this
also let you loop forever with the right two JMP instructions.  But 
since the machine had infinite levels of indirect resolution, you could
get the same effect by doing a JMP *+1,I
                               DEF *,I

The multiply instructions assumed the B register held the most significant
bits.  The 32 bit integer instructions assumed the A register did.  THe
code generators produced a lot of SWP instructions, which was of course
implemented as a double rotate right (or was it left?) 16 bits.  Not
super fast.

Try this at home: the "multiply+16" opcode (101370, if memory serves)
jumped into a microcode jump table at some point.  Unfortunately, the
microcoder was seriously out of room at that point, so he or she had
decided to use part of the jump table as code to handle the machine's
front panel.  This on the (obsolete) F series machines.  Executing a
multiply+16 opcode put you into the front panel, where you were of
course uninterruptable.

Those were the days, back when instruction sets were designed by people
that were learning by doing.  Fortunately HP's more recent products
such as the Precision Architecture (notice I didn't say Spectrum :-)
machines are much, MUCH better designs.  One of the most effective
instruction sets I've run across from any manufacturer, and I didn't
even work on it, so I'm a little more objective than it would seem.
Lots of good stuff to help make things go fast for both individual
programs as well as system performance, without using much hardware.
Joe Bob says check it out.

Doug "those were the days" Hartman
at "Spectrums R Us" in scenic Cupertino