Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!elroy.jpl.nasa.gov!decwrl!fernwood!portal!cup.portal.com!Radagast From: Radagast@cup.portal.com (sullivan - segall) Newsgroups: comp.sys.amiga.emulations Subject: Re: Emulator Mechanics (sorry long post) Message-ID: <40036@cup.portal.com> Date: 12 Mar 91 02:53:43 GMT References: <4992@mindlink.UUCP> <1991Mar6.010141.5905@mintaka.lcs.mit.edu> <1303@macuni.mqcc.mq.oz> <39935@cup.portal.com> <10775@dog.ee.lbl.gov> Distribution: na Organization: The Portal System (TM) Lines: 111 Subject: mail failed, returning to sender Reference: |------------------------- Message log follows: -------------------------| |------------------------- Failed addresses follow: ---------------------| ... transport smtp: 550 ... User unknown |------------------------- Message text follows: ------------------------| >Organization: University of Chicago >Cc: > >I have been reading this thread with a lot of interest - but have a few >questions regarding the separation of code and data. > >I don't quite understand why this is _absolutely_necessary. I thought that >this discussion started with the idea of simplifying things by using lots of >memory. Thus, my conception of the compiler/emulator goes something like: > >You keep the original .exe file, and maybe even an entire 640K memory map >in memory. The compiler generates 680x0 equivalent code for every instruction, >with every memory read or write referring to the contents of the ORIGINAL >map. If we agree not to worry about self-modifying code, the 680x0 code never >really has to be modified. It will contain a considerable amount of junk >corresponding to data, but then this stuff would never get executed in the >original, and so won't get executed here either. The problem here is that the compiler can easily get out of synch with the program. Suppose my source looks like: Call Print db "Hello World",LF,0 Call CheckErr Now if the object code is treated as all executable, there is no guarantee that the instruction pointer of the cross compiler will ever stop on the first byt of the "Call" instruction. If the address for CheckErr happens to correspond to some real opcode, the IP could stay out of synch for quite some time. If the next statement had been an RET instruction, you might miss and start executing completely irrelevant code. Of course when you execute the code, and find that the return address isn't among any of the symbolic addresses available, you might realize that something has gone awry, but by then it is too late to fix it. More importantly though what you gain in speed by compiling the code, is completely lost in searching for address translations in the executable. Unless you are willing to spend 4 megs memory just to translate addresses for each byte in the source code, every return address popped from the stack, every indexed jump, every vectored jump will have to be translated into the equivalent location in 68000 code. > >Of course, there would still need to be code to handle whether, say, video >ram was written to, or whatnot. I guess you could just add this code to >the code that replaces any instructions that write to memory. > >I'd appreciate being further enlightened on the mechanics of emulators... > >Amish (asd2@ellis.uchicago.edu) Video writes are relatively easy. At least in that case you know that the address contains data, and not executable (same goes for reads). Other problems also arise. Suppose I want to move a segment register to a normal register. In 68k code, the move first requires that the address register be shifted right four bits. (Intel segment registers lack the lower four bits, so moves are always shifted automatically when relocated from \@ a segment register to a data register.) The question the comes, what do you do if you aren't sure whether the source data is from a segment register or a data register. Suppose you push a segment register on the stack, and pop a data register. Obviously segment registers have to be shifted whenever they are moved to anything (other than another segment register.) Unfortunately in the Intel command set it is impossible to move data between segment registers, so moves through other registers or the stack are performed constantly. But if you keep the segment register right shifted normally, you then have to left shift every time the register is dereferenced. Which is worse? Okay, so you've solved all of these problems... Well there is at least one more reason to seperate code from data (and this is what most people will refer to.) You've translated the code because Intel code won't run on a Motorola chip. Unfortunately the same is true of Intel Data. xDIntel always stores the most significant byte of any operand first. Motorola always stores the least significant byte of any operand first. So if you really want to translate a program, the data should be translated as well. Next the question comes, how do you really know the size of any data. If a data location is referred to as a word then the contents of the first and second bytes should be swapped. If it is referred to as both word and byte, the data should be swapped and the byte reference should be changed to the other byte. But now how do you handle indexed or calculated references. You may not be able to find any references to the data point, and so be unable to determine its contents. But if you convert all of the data references, you are right back where you started, not executing 68k code. Instead each reference is interpreted and loaded or written a single byte at a time. r -Sullivan_-_Segall (a.k.a. Radagast) _______________________________________________________________ /V\ E-Credibility: (n -- ME) The unguaranteed likelyhood that ' the electronic mail you are reading is genuine rather than someone's made up crap. _______________________________________________________________ Mail to: ...sun!portal!cup.portal.com!radagast or radagast@cup.portal.com