Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!hplabs!hpcc05!hpcuhb!hpcuhe!edwardm
From: edwardm@hpcuhe.cup.hp.com (Edward McClanahan)
Newsgroups: comp.arch
Subject: Re: HP-PA and CISC emulation (was Re: Will NeXT survive?)
Message-ID: <32580031@hpcuhe.cup.hp.com>
Date: 13 May 91 22:20:28 GMT
References: <RANG.91May7021441@nexus.cs.wisc.edu>
Organization: Hewlett Packard, Cupertino
Lines: 91

Anton Rang writes:

> In article <8324@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes:
> >(Does HP-PA do this right now? If so, I am very impressed. I would be
> >much more impressed if it could also run the large existing libraries
> >of CISC binaries at full speed, but that would be asking quite a bit
> >:-)

  > I seem to recall that the high-end HP-PA machines run HP/3000
> binaries (under MPE) faster than the HP/3000 series itself ever did.
> But I could be wrong.  I don't know if this is done with a full
> software emulator, or with a binary->binary translator, etc.

Mike Santangelo replies:

> The HP-PA based HP3000 systems use a very sophisticated emulation
> system which makes use of something HP calls "millicode".

Actually, classic-3000 emulation and "millicode" are two completely
different concepts in MPE XL.  Mike has alot of interesting information
in his posting, but let me clarify three points:

1 - Millicode is really just a faster calling sequence for short
    assembly routines.  The best examples are routines which move
    a block of memory (i.e. copy structures) and string functions.
    The compilers are told which registers (and which globals) are
    modified during execution.  For normal procedure calls, the
    optimizer must flush variables/fields held in registers and
    start over after the call.  Millicode calls do not have such
    a detrimental effect on optimization.  In addition, caller-saved
    registers don't necessarily need to be saved prior to the Millicode
    call.

2 - Classic 3000 emulation involves dedicating a large majority of
    the HP-PA (now PA-RISC) general registers to an emulation register
    which is basically a big CASE/SWITCH statement in a LOOP.  Each
    instruction has its own CASE entry to implement the instruction.
    This code is highly optimized and achieves impressive performance.
    A so-called old-timer told me once that the emulated instruction
    set is more complete than any hardware/microcode implementation
    (as well as better documented).

3 - Classic 3000 translation is a further optimization step where the
    loop overhead of the emulator is removed (by unfolding the loop).
    Here is a trivial diagram to explain the difference:

    Suppose emulated machine has two insructions, E1 and E2.
    Suppose native instructions are of the form N1, N2, ...

              Emulator:

                 Load instruction

                 CASE/SWITCH on instruction

                    E1:  N1
                         N2                          
                         <break>

                    E2:  N3
                         N4
                         <break>

                 Go get next instruction

    Suppose classic 3000 program is:

              E1
              E2
              E2

    The translator will convert the classic 3000 program into:

              N1       ; Code for E1
              N2
              N3       ; Code for first E2
              N4
              N3       ; Code for second E2
              N4

    The loop overhead is what is eliminated with the translated program.

I hope this helps...

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

  Edward McClanahan
  Hewlett Packard Company     -or-     edwardm@cup.hp.com
  Mail Stop 42UN
  11000 Wolfe Road                     Phone: (480)447-5651
  Cupertino, CA  95014                 Fax:   (408)447-5039