Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!crdgw1!uunet!comp.vuw.ac.nz!windy!srwmpnm From: srwmpnm@windy.dsir.govt.nz Newsgroups: comp.sys.amiga.emulations Subject: Re: CPU-emulators Message-ID: <18878.2801a4ad@windy.dsir.govt.nz> Date: 9 Apr 91 11:25:31 GMT Organization: DSIR, Wellington, New Zealand Lines: 71 Ilja Heitlager (iheitla@cs.vu.nl) wrote: >I'm planning to write a 6502 (and maybe when I like it some others) emulator. Good on you! I've played around with the Z80 emulators for the Amiga, by Ulf Nordquist and Charlie Gibbs, making them faster. I have never touched 6502 but the same techniques should apply. >At this moment I think there are two ways of doing it: > 1- Compare every Opcode and jump to a routine which executes the > instruction > 2- Do it more or less the way the microcode does it. > Ok in software you can't do more operations at the same moment. I found at several more fundamentally different ways of doing it, and many variations on those. So far the fastest practical method seems to be threaded code. You can avoid decoding an opcode for every 6502 instruction altogether! The emulation routine for each 6502 opcode ends with: move.l (a3)+,a0 jmp (a0) So each emulation routine jumps directly to the next emulation routine without any decoding at all. Register a3 is acts like a "pseudo pc" into a 256 kbyte table in which there is a longword pointer to the emulation routine for each corresponding opcode in the 64 kbyte 6502 address space. Now, every time the 6502 writes to RAM, you need to update an entry in the 256 kbyte table. At first it looks as if you have to do an instruction decode to compute the new table value every time the 6502 writes to RAM. But in fact that is not necessary either! What you do, when the 6502 writes to RAM, is to write a constant address into the table. That constant address points to a special routine called "patch". When patch is called, you finally get to do an instruction decode. Patch computes the address of the routine for the current instruction, stuffs it in the 256 kbyte table, then jumps to the routine for the current instruction. Next time this instruction is executed, control bypasses patch and goes directly to the right routine. A variation of this method which saves memory but is slightly slower, is to use word offsets in a 128 kbyte table, instead of longword addresses in a 256 kbyte table. Each routine ends with: move.w (a3)+,d0 jmp 0(a2,d0.w) where a2 holds the base from which all the routine offsets are computed. This method has more advantages: 1: To handle known ROM entry points, just point the vector for the entry point at an optimised 68000 routine to do what the ROM routine does. There is no overhead at all in checking for ROM entry points. 2: To handle multiple-byte opcodes (e.g, prefix instructions), patch can be made smart enough to point the vector for the prefix byte to the routine for the entire instruction. There is no need to decode opcodes after the prefix every time the instruction is executed. 3: Patch can be made smart enough to recognise common sequences of 6502 instructions, and to point the vector at an optimised 68000 routine for the whole sequence. Note that 2 and 3 above (if implemented) won't correctly emulate certain types of self-modifying code. There was a good article on "Portable Fast Direct Threaded Code" by Eliot Miranda in comp.compilers recently. He uses GCC to write "machine independent" threaded code that is just about as efficient as my 68000-specific code. Hope this helps. Regards, Peter McGavin. (srwmpnm@wnv.dsir.govt.nz)