Path: utzoo!utgpu!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!nosc!ucsd!ucbvax!hplabs!hpda!hpcupt1!hartman From: hartman@hpcupt1.HP.COM (Doug Hartman) Newsgroups: comp.arch Subject: Re: HP2100 (was: Re: Self Modifying Code) Message-ID: <6310011@hpcupt1.HP.COM> Date: 28 Jul 88 01:59:56 GMT References: <33895@yale-celray.yale.UUCP> Organization: Hewlett Packard, Cupertino Lines: 89 Re: 2100 architecture. While recent descendents of this machine have stacks, reentrant code, virtual memory, etc., there are some interesting trivia notes about this machine. Most people used Fortran or assembly (the Algol compiler notwithstanding; it has acquired Pascal and ADA (!?) by now...), so they didn't notice the lack of a stack too much. The Fortran compiler used to rely on the fact that self modifying code was "cool". For example, a typical two word instruction to load a 32 bit floating point number looked like so: octal address (in words) instruction word 2000 DLD (=105400 or something...) 2001 As a compiler, you have a couple of choices. You can compute the address of an array element in a register, then use the fact that the A register is address zero and the B register is address 1 to get DLD A,I in assembly. Or, you can put the address in a variable: STA PTR DLD PTR,I If you want to re-use the result of the address computation. But this takes one more memory cycle than if the instruction contained the direct address. The solution is obvious: STA *+2 DLD This puts the address to load right in the instruction stream. Makes pipelining fun...not that the machine was pipelined. Other trivia notes: having only two registers tended to simplify the procedure call convention and interrupt state save...:-). The register set was not symmetric--you could only use the A register for the boolean operations. No subtract, but you could XOR to anywhere on the current page or page zero, possibly indirect, in one instruction... The Fortran compiler didn't use the B register at all for the first ten years of so. Teaching it about B improved performance about 3%, mostly because the compiler didn't save any register allocation information across statements. To set X and Y to zero you would get CLA STA X CLA STA Y It would only get to B in certain "complex" expressions. Later versions of the machine were heavily microcoded. Things like move words (which saved the count of words moved in the instruction stream in order to restart after interrupt), compare bytes, load from 32 bit virtual address, up through extreme CISC-isms such as DPOLY, which computed the Chebychev polynomial expansion given a table of coefficents, a double precision vector instruction set including "one instruction" pivot operations, and support for all of the transcendantal instructions used by the Whetstone benchmark. I wonder if anybody ever coded the hyperbolic tangent instruction in assembly? The machine held off interrupts for one cycle following a JMP indirect to let you get out of the operating system "cleanly". Of course this also let you loop forever with the right two JMP instructions. But since the machine had infinite levels of indirect resolution, you could get the same effect by doing a JMP *+1,I DEF *,I The multiply instructions assumed the B register held the most significant bits. The 32 bit integer instructions assumed the A register did. THe code generators produced a lot of SWP instructions, which was of course implemented as a double rotate right (or was it left?) 16 bits. Not super fast. Try this at home: the "multiply+16" opcode (101370, if memory serves) jumped into a microcode jump table at some point. Unfortunately, the microcoder was seriously out of room at that point, so he or she had decided to use part of the jump table as code to handle the machine's front panel. This on the (obsolete) F series machines. Executing a multiply+16 opcode put you into the front panel, where you were of course uninterruptable. Those were the days, back when instruction sets were designed by people that were learning by doing. Fortunately HP's more recent products such as the Precision Architecture (notice I didn't say Spectrum :-) machines are much, MUCH better designs. One of the most effective instruction sets I've run across from any manufacturer, and I didn't even work on it, so I'm a little more objective than it would seem. Lots of good stuff to help make things go fast for both individual programs as well as system performance, without using much hardware. Joe Bob says check it out. Doug "those were the days" Hartman at "Spectrums R Us" in scenic Cupertino