Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!hplabs!hpcc05!hpcuhb!hpcuhe!edwardm From: edwardm@hpcuhe.cup.hp.com (Edward McClanahan) Newsgroups: comp.arch Subject: Re: HP-PA and CISC emulation (was Re: Will NeXT survive?) Message-ID: <32580031@hpcuhe.cup.hp.com> Date: 13 May 91 22:20:28 GMT References: Organization: Hewlett Packard, Cupertino Lines: 91 Anton Rang writes: > In article <8324@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes: > >(Does HP-PA do this right now? If so, I am very impressed. I would be > >much more impressed if it could also run the large existing libraries > >of CISC binaries at full speed, but that would be asking quite a bit > >:-) > I seem to recall that the high-end HP-PA machines run HP/3000 > binaries (under MPE) faster than the HP/3000 series itself ever did. > But I could be wrong. I don't know if this is done with a full > software emulator, or with a binary->binary translator, etc. Mike Santangelo replies: > The HP-PA based HP3000 systems use a very sophisticated emulation > system which makes use of something HP calls "millicode". Actually, classic-3000 emulation and "millicode" are two completely different concepts in MPE XL. Mike has alot of interesting information in his posting, but let me clarify three points: 1 - Millicode is really just a faster calling sequence for short assembly routines. The best examples are routines which move a block of memory (i.e. copy structures) and string functions. The compilers are told which registers (and which globals) are modified during execution. For normal procedure calls, the optimizer must flush variables/fields held in registers and start over after the call. Millicode calls do not have such a detrimental effect on optimization. In addition, caller-saved registers don't necessarily need to be saved prior to the Millicode call. 2 - Classic 3000 emulation involves dedicating a large majority of the HP-PA (now PA-RISC) general registers to an emulation register which is basically a big CASE/SWITCH statement in a LOOP. Each instruction has its own CASE entry to implement the instruction. This code is highly optimized and achieves impressive performance. A so-called old-timer told me once that the emulated instruction set is more complete than any hardware/microcode implementation (as well as better documented). 3 - Classic 3000 translation is a further optimization step where the loop overhead of the emulator is removed (by unfolding the loop). Here is a trivial diagram to explain the difference: Suppose emulated machine has two insructions, E1 and E2. Suppose native instructions are of the form N1, N2, ... Emulator: Load instruction CASE/SWITCH on instruction E1: N1 N2 E2: N3 N4 Go get next instruction Suppose classic 3000 program is: E1 E2 E2 The translator will convert the classic 3000 program into: N1 ; Code for E1 N2 N3 ; Code for first E2 N4 N3 ; Code for second E2 N4 The loop overhead is what is eliminated with the translated program. I hope this helps... =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Edward McClanahan Hewlett Packard Company -or- edwardm@cup.hp.com Mail Stop 42UN 11000 Wolfe Road Phone: (480)447-5651 Cupertino, CA 95014 Fax: (408)447-5039