Path: utzoo!attcan!uunet!zaphod.mps.ohio-state.edu!think.com!paperboy!meissner
From: meissner@osf.org (Michael Meissner)
Newsgroups: comp.arch
Subject: Re: RISCizing a CISC processor
Message-ID: <MEISSNER.90Dec11123441@curley.osf.org>
Date: 11 Dec 90 17:34:40 GMT
References: <9012070105.AA02416@hcrlgw.crl.hitachi.co.jp> <1200@dg.dg.com>
	<1311@inews.intel.com>
Sender: news@OSF.ORG
Organization: Open Software Foundation
Lines: 89
In-reply-to: dlau@mipos2.intel.com's message of 10 Dec 90 20:22:12 GMT

In article <1311@inews.intel.com> dlau@mipos2.intel.com (Dan Lau)
writes:

| In article <1200@dg.dg.com> uunet!dg!lewine writes:
| >In article <9012070105.AA02416@hcrlgw.crl.hitachi.co.jp>, joe@hcrlgw.crl.hitachi.co.JP (Dwight Joe) writes:
| >	***HOWEVER***, the advantage of RISC is moving work from 
| >    runtime to compile time.  The big speedup comes from compiler
| >    work not hardware. At Data General we have modified some of
| >    the compilers for our CISC MV-series to compile simple code
| >    instead of using instructions like WEDIT.  This has produced
| >    major performance enhancements because a compiler can generate
| >    special case code. 
| 
| I don't understand the comment above about the MV-series compilers.
| Are you saying that after DG changed the MV-series compilers to generate
| simple code, there was a major performance improvement (over the complex
| code)?  Or are you saying that "because a compiler can generate special
| case code" (i.e., very complex instructions like WEDIT), there was a
| major performance enhancement over the simple code?
| 
| I am confused, can you please clarify the above.  Thanks.
| 	Dan Lau

Let me try to clarify some things.  Only certain compilers actually
generated WEDIT (notably Cobol and PL/1, possibly Basic).  The
{,W}EDIT instruction was actually a secondary instruction set that
read a bytestream to figure out how to convert a number to a stream of
bytes (I'm slightly fuzzy here, because in my ten years at Data
General, I never once used a WEDIT instruction).  Most programs do not
need the complex interpretation, since the format is known at compile
time.  On these programs, the code generator would issue multiple
simple instructions instead of WEDIT.  I believe for some machines at
least, WEDIT was removed, and the kernel would then simulate it if a
WEDIT was actually used (old program, etc.).

While I'm talking about the MV, let me expound on a successful way the
MV was extended, and an unsuccessful way.

For those of you who have never looked at the DG Nova/Eclipse/MV
instruction set, there are 4 integer registers (on all versions), and
4 floating point registers (on the Eclipse and MV/Eclipse).  Only two
of the integer registers can be used as index registers.  On the
MV/Eclipse, the 4 stack values (stack pointer, frame pointer, stack
base, and stack limit) are also held in registers, but there is no
direct addressing mode to use these registers.  The standard save
instruction puts the frame pointer in one of the index registers.
Needless to say, this put a crimp in code generation, particularly in
doing things like:

	p1->field1 = p2->field1;
	p1->field2 = auto_var;
	p1->field3 = p2->field3;

So we in Langauges, requested an addition to the instruction set that
would give frame pointer relative addressing (and possibly stack
pointer as well).  For existing machines in the field, there was a
slight penality to the upgrade, but one of the machines (the MV/7800
if I remember correctly) that was under development, but not yet
shipped could only do this instruction in 27 clocks (ie, it would be
faster on that machine to do a push, load register, whatever, pop).
So, this feature had to be scrapped, because the hardware people
didn't/couldn't respin the silicon.  Sigh....

The more successful upgrade was how the sine, cosine, etc.
instructions were added.  For the high end machines (MV/10000 with
FPU, MV/20000, and presumably MV/40000), the machine would have a
hardware accelerator which would do the operation, but it was
important to have the same binaries run on the low end machines as
well with as little slowdown regarding the old method of calling
library functions.  The architect noticed that the standard long call
instruction had a left over bit that was easy for the microcode to
access, so the new instructions had the format:

	<16 bit opcode>
	<32 bit address of emulator>
	<16 bit subopcode>

(on the long call instruction, the <16 bit subocode> field was the
argument could that was pushed on top of the stack, so the return
instruction could know how many words to pop off).  This way, you did
not have to trap to the kernel to implement the instructions, which
can be much too slow, but instead just called the emulator directly.


--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?