Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!spool.mu.edu!munnari.oz.au!metro!macuni!sunb!ifarqhar From: ifarqhar@sunb.mqcc.mq.oz.au (Ian Farquhar) Newsgroups: comp.sys.amiga.emulations Subject: Re: Emulator Mechanics (sorry long post) Message-ID: <1312@macuni.mqcc.mq.oz> Date: 8 Mar 91 03:51:21 GMT References: <4992@mindlink.UUCP> <1991Mar6.212548.9641@mentorg.com> Sender: news@macuni.mqcc.mq.oz Organization: Macquarie University, Sydney, Australia. Lines: 61 In article <1991Mar6.212548.9641@mentorg.com> dclemans@mentorg.com (Dave Clemans @ APD x1292) writes: >To use it, you basically had to develop enough information to >get a clean disassembly of the Intel code; i.e., so that you "knew" >where all the code and data was in the source object file. >That then was used to drive the tool that produced the "compiled" >68K file. After that was done you had to go over the output >for correctness, system dependencies, etc.; it was not intended >as a turn key system. Well, on a 6502, it is practically impossible to truly decide what is code and what is data. Let's imagine that you are (as I have done), writing a 6502 disassembler. For every byte you store a status thst says DATA, OPCODE, ARGUMENT, and also non-exclusive flags (eg. BRANCHENTERSHERE, BRANCHISNOTHERE.) Initially, all bytes are set to DATA, and BRANCHISNOTHERE. Write a recursive procedure that starts at the program entry point, and goes through the code. For every byte read as an opcode, tag it as an OPCODE, and the bytes following as ARGUMENT. If you get to a branch instruction, recursively call the new branch point, and continue processing that until you hit a byte that has already been processed (ie. not tagged DATA.) You should continue until the whole procedure exits, then run the same thing on the RESET, INT and NMI vectors. Calls are treated the same way as branches, except that the routine exits to a higher level invocation when it hits a RET or RTI. Now, you should have all the data tagged as either program (OPCODE and ARGUMENT), or DATA. Right? Wrong. Why? Because the 6502 has no branch always instruction, and your program may continue past what appears to be a conditional branch, into data, and screw everything up completely. I experimented with using a two pass approach to this problem. First, the program was scanned sequentially, treating every byte as an opcode, and tagging every point referenced by some branch, call, jump or vector. Then, when a branch was found during the second recursive pass, the program would backtrack and examine every last opcode till it hit a branch in point (after which no assumptions could be made), to see if the flags were left in a deterministic state. At this point I lost interest in the whole idea. Anyway, on the 6502 and anything without a BRA or equivalent, the problem of automatically determining what is data and what is code is extremely difficult. However, on the 68K, this approach is probably quite profitable. Why? Because there is enough correspondence between the 6502 and 68K instructions sets (both having the same ancestor, the 6800) to mean that the compilation process is reasonably simple. Simulating the hardware is still a problem, and I'll have to give that one some thought... I still tend to favor the idea that I presented in a previous article, carrying around a compiled image (of code only), and uncompiled data with labels to the compiled code and handlers for the I/O locations. -- Ian Farquhar Phone : + 61 2 805-9400 Office of Computing Services Fax : + 61 2 805-7433 Macquarie University NSW 2109 Also : + 61 2 805-7420 Australia EMail : ifarqhar@suna.mqcc.mq.oz.au