Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!elroy.jpl.nasa.gov!aero-c!usc!snorkelwacker.mit.edu!mintaka!pogo.ai.mit.edu!rjc From: rjc@pogo.ai.mit.edu (Ray Cromwell) Newsgroups: comp.sys.amiga.emulations Subject: Re: Emulator Mechanics (sorry long post) Message-ID: <1991Mar6.010141.5905@mintaka.lcs.mit.edu> Date: 6 Mar 91 01:01:41 GMT References: <4992@mindlink.UUCP> Sender: daemon@mintaka.lcs.mit.edu (Lucifer Maleficius) Organization: None Lines: 120 Very interesting article. I myself have been tempted several times to try and write an emulator. Since I programmed 6502 assembly on the C64 for 4 years, and I know 68000 on the Amiga, I was tempted to try to beat the speed of the other emulators. Then I realized the sheer magnitude of the project. Emulating the instruction set is easy. In fact, I am quite confident I can make a 6502 emulator run faster on the Amiga then the C64. The hard part is the hardware. Most C64 programs discard the oS entirely and bang on the hardware. Further more, most of them use polling loops, like polling the raster beam register and using precisely cycle timed delays. Moreover, the VIC chip contains several glitches that allow programmers to use tricks to remove the borders, vertically scroll the screen to ANY raster location, horizontally shift the screen, vertically and horizonally interlace the screen, stretch pixels (double, triple, quaduple) length vertically. This is virtually IMPOSSIBLE to detect, unless the emulator is artifically inteligent. And, any program that has a fastloader won't work. This is because fastloaders usually transfer data over the serial clock line, and data line. This doubles bandwidth, unforunately it requires PERFECT timing, so perfect in fact that it won't work on PAL computers, and vice versa. Sprites are another problem, since Amiga sprites are only 16 pixels wide, and C64 sprites can have their width and heigth doubled, and they rely on a chunky pixel format. Text is another problem since the C64 has a builtin Text mode. The Mac is the easiest computer to emulate because it's not a computer at all. The Macintosh computer does not exist, it's nothing more than a ROM chip. A few days ago, I was impressed. I downloaded a demo from ab20 called C64Music.zap. This demo emulates 6502 at 100% (in fact, it emulates it at perfect timing because the music is exactly the same speed.) This demo emulates the SID chip PERFECTLY, and I mean perfect. These guys should join together with the maker of A64. I can't speak for other 6502 emulators, but if I wrote one, the fastest method looks like table lookup, with careful optimization to make sure things are long word aligned. For instance, I might do something like pc6502 equr a5 accum equr d5 xindex equr d6 yindex equr d7 stack equr a4 ; 6502 stack, which is base address + $0100 on the C64 stat equr d4 ;status register [allocmem the 6502's address space, load in an executable and set the pc to it's start] lea jumptbl(pc),a2 sub.l d0,d0 loop move.b (pc6502)+,d0 lsl.l #2,d0 move.l (a2,d0),a3 jmp (a3) then every instruction would be emulated (even undoc's) and put into the jumptbl. The code for 'LDA address' might look like: lda sub.l d0,d0 move.b (pc6502)+,d0 lsl.l #8,d0 or.b (pc6502)+,d0 add.l addresspace,d0 ;this code inverts the 6502 little-endian ; and then add's the base address of the ; memory that was alloc's for it move.l d0,a3 move.b (a3),accum jsr GetCC() ? ;It might be better to use move SR, providing you ; check the machine you were running on and did ; and did a move ccr otherwise ; status reg is now in d0 and #mask,d0 ; mask off everything but the Z bit bne whatever whatever bclr #1,stat jmp loop bset #1,stat jmp loop (note: this code can be optimized, its off the tip of my tongue, and probably bugged since I haven't coded in asm in awhile) From my quick calculations, the jump table dispatcher incurs about a 3-4 microsecond delay in the fetch of each instruction. This is equivelent to about 4 cycles on a 6502 @1.02mhz. If you had infinite amounts of ram, the object code loader could 'inline' the code for each instruction and get rid of this delay, I beleive this is probably how the C64Music demo does it, since music players on the C64 were only about 1k of code. The Lda routine itself looks like about 2.2 times slower than a true 6502 delay which 4 cycles. However a 25mhz 68030 would run more than twice as fast. Theoretically speaking, an IBM emulator running on an Amiga3000 should be running at atleast 5mhz 80286 speed. Consider SoftPC on the NeXT which runs at 12mhz UNDER UNIX. 68040's are about 2-3 times faster then 68030's, so SoftPC on the Amy should run at about 5mhz. Maybe we all should trying something like 'NetIBM'. What I mean, is like Nethack, we should all participate in coding an IBM emulator. Each person might post a small code segment (in assembler), the rest of us can compete optimize it. I remember having a contest with some trying to optimize the 'size' of a binary to decimal print routine, the final result was the code was reduced 300% in size. (we kept passing the optimized source back and forth, each shedding a few bytes.) Regardless of what happens. Let's keep the discussion up, it's interesting and educational.