Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!decwrl!fernwood!portal!cup.portal.com!Radagast
From: Radagast@cup.portal.com (sullivan - segall)
Newsgroups: comp.sys.amiga.emulations
Subject: Re: Emulator Mechanics (sorry long post)
Message-ID: <40036@cup.portal.com>
Date: 12 Mar 91 02:53:43 GMT
References: <4992@mindlink.UUCP> 
  <1991Mar6.010141.5905@mintaka.lcs.mit.edu> <1303@macuni.mqcc.mq.oz>
  <39935@cup.portal.com> <10775@dog.ee.lbl.gov>
Distribution: na
Organization: The Portal System (TM)
Lines: 111

Subject: mail failed, returning to sender
Reference: <m0jFzQc-0000p8C@nova.unix.portal.com>

|------------------------- Message log follows: -------------------------|

|------------------------- Failed addresses follow: ---------------------|
 <amish.s.dave@midway.uchicago.edu> ... transport smtp: 550 <amish.s.dave@midwa
y.uchicago.edu>... User unknown

|------------------------- Message text follows: ------------------------|

>Organization: University of Chicago
>Cc:
>
>I have been reading this thread with a lot of interest - but have a few
>questions regarding the separation of code and data.
>
>I don't quite understand why this is _absolutely_necessary. I thought that
>this discussion started with the idea of simplifying things by using lots of
>memory. Thus, my conception of the compiler/emulator goes something like:
>
>You keep the original .exe file, and maybe even an entire 640K memory map
>in memory. The compiler generates 680x0 equivalent code for every instruction,
>with every memory read or write referring to the contents of the ORIGINAL
>map. If we agree not to worry about self-modifying code, the 680x0 code never
>really has to be modified. It will contain a considerable amount of junk
>corresponding to data, but then this stuff would never get executed in the
>original, and so won't get executed here either.

The problem here is that the compiler can easily get out of synch with
the program.  Suppose my source looks like:

        Call Print
        db "Hello World",LF,0
        Call CheckErr

Now if the object code is treated as all executable, there is no guarantee
that the instruction pointer of the cross compiler will ever stop on the
first byt of the "Call" instruction.  If the address for CheckErr happens
to correspond to some real opcode, the IP could stay out of synch for quite
some time.  If the next statement had been an RET instruction, you might
miss and start executing completely irrelevant code.  Of course when you
execute the code, and find that the return address isn't among any of the
symbolic addresses available, you might realize that something has gone
awry, but by then it is too late to fix it.

More importantly though what you gain in speed by compiling the code, is
completely lost in searching for address translations in the executable.
Unless you are willing to spend 4 megs memory just to translate addresses
for each byte in the source code, every return address popped from the stack,
every indexed jump, every vectored jump will have to be translated into the
equivalent location in 68000 code.
>
>Of course, there would still need to be code to handle whether, say, video
>ram was written to, or whatnot. I guess you could just add this code to
>the code that replaces any instructions that write to memory.
>
>I'd appreciate being further enlightened on the mechanics of emulators...
>
>Amish (asd2@ellis.uchicago.edu)

Video writes are relatively easy.  At least in that case you know that
the address contains data, and not executable (same goes for reads).
Other problems also arise.  Suppose I want to move a segment register
to a normal register.  In 68k code, the move first requires that the
address register be shifted right four bits.  (Intel segment registers
lack the lower four bits, so moves are always shifted automatically when
relocated from \@ a segment register to a data register.)  The question
the comes, what do you do if you aren't sure whether the source data is
from a segment register or a data register.  Suppose you push a segment
register on the stack, and pop a data register.  Obviously segment
registers have to be shifted whenever they are moved to anything
(other than another segment register.)

Unfortunately in the Intel command set it is impossible to move data
between segment registers, so moves through other registers or the stack
are performed constantly.  But if you keep the segment register right
shifted normally, you then have to left shift every time the register
is dereferenced.  Which is worse?

Okay, so you've solved all of these problems... Well there is at least
one more reason to seperate code from data (and this is what most people
will refer to.)  You've translated the code because Intel code won't
run on a Motorola chip.  Unfortunately the same is true of Intel Data.
xDIntel always stores the most significant byte of any operand first.
Motorola always stores the least significant byte of any operand first.
So if you really want to translate a program, the data should be translated
as well.  Next the question comes, how do you really know the size of
any data.  If a data location is referred to as a word then the contents
of the first and second bytes should be swapped.  If it is referred to
as both word and byte, the data should be swapped and the byte reference
should be changed to the other byte.  But now how do you handle indexed
or calculated references.  You may not be able to find any references to
the data point, and so be unable to determine its contents.

But if you convert all of the data references, you are right back where
you started, not executing 68k code.  Instead each reference is interpreted
and loaded or written a single byte at a time.

r

                           -Sullivan_-_Segall (a.k.a. Radagast)
_______________________________________________________________

/V\  E-Credibility:  (n -- ME) The unguaranteed likelyhood that
 '   the electronic mail you are reading is genuine rather than
someone's made up crap.
_______________________________________________________________

Mail to: ...sun!portal!cup.portal.com!radagast or
         radagast@cup.portal.com