Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!spool.mu.edu!munnari.oz.au!metro!macuni!sunb!ifarqhar
From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar)
Newsgroups: comp.sys.amiga.emulations
Subject: Re: Emulator Mechanics (sorry long post)
Message-ID: <1312@macuni.mqcc.mq.oz>
Date: 8 Mar 91 03:51:21 GMT
References: <4992@mindlink.UUCP> <1991Mar6.212548.9641@mentorg.com>
Sender: news@macuni.mqcc.mq.oz
Organization: Macquarie University, Sydney, Australia.
Lines: 61

In article <1991Mar6.212548.9641@mentorg.com> dclemans@mentorg.com (Dave Clemans @ APD x1292) writes:
>To use it, you basically had to develop enough information to
>get a clean disassembly of the Intel code; i.e., so that you "knew"
>where all the code and data was in the source object file.
>That then was used to drive the tool that produced the "compiled"
>68K file.  After that was done you had to go over the output
>for correctness, system dependencies, etc.; it was not intended 
>as a turn key system.

Well, on a 6502, it is practically impossible to truly decide what is
code and what is data.  Let's imagine that you are (as I have done),
writing a 6502 disassembler.  For every byte you store a status thst
says DATA, OPCODE, ARGUMENT, and also non-exclusive flags (eg.
BRANCHENTERSHERE, BRANCHISNOTHERE.)  Initially, all bytes are set to DATA,
and BRANCHISNOTHERE.

Write a recursive procedure that starts at the program entry point, and
goes through the code.  For every byte read as an opcode, tag it as an
OPCODE, and the bytes following as ARGUMENT.  If you get to a branch
instruction, recursively call the new branch point, and continue
processing that until you hit a byte that has already been processed
(ie. not tagged DATA.)  You should continue until the whole procedure
exits, then run the same thing on the RESET, INT and NMI vectors.  Calls
are treated the same way as branches, except that the routine exits to a
higher level invocation when it hits a RET or RTI.

Now, you should have all the data tagged as either program (OPCODE and
ARGUMENT), or DATA.  Right?  Wrong.  Why?  Because the 6502 has no
branch always instruction, and your program may continue past what
appears to be a conditional branch, into data, and screw everything up
completely.

I experimented with using a two pass approach to this problem.  First,
the program was scanned sequentially, treating every byte as an opcode,
and tagging every point referenced by some branch, call, jump or vector.
Then, when a branch was found during the second recursive pass, the
program would backtrack and examine every last opcode till it hit a
branch in point (after which no assumptions could be made), to see if
the flags were left in a deterministic state.  At this point I lost
interest in the whole idea.

Anyway, on the 6502 and anything without a BRA or equivalent, the
problem of automatically determining what is data and what is code is
extremely difficult.

However, on the 68K, this approach is probably quite profitable.  Why?
Because there is enough correspondence between the 6502 and 68K
instructions sets (both having the same ancestor, the 6800) to mean that
the compilation process is reasonably simple.

Simulating the hardware is still a problem, and I'll have to give that
one some thought...  I still tend to favor the idea that I presented in
a previous article, carrying around a compiled image (of code only), and
uncompiled data with labels to the compiled code and handlers for the
I/O locations.

--
Ian Farquhar                      Phone : + 61 2 805-9400
Office of Computing Services      Fax   : + 61 2 805-7433
Macquarie University  NSW  2109   Also  : + 61 2 805-7420
Australia                         EMail : ifarqhar@suna.mqcc.mq.oz.au