Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!shadooby!ginosko!uunet!lotus!esegue!johnl From: johnl@esegue.segue.boston.ma.us (John R. Levine) Newsgroups: comp.arch Subject: Re: Self-modifying code Message-ID: <1989Oct11.013553.3893@esegue.segue.boston.ma.us> Date: 11 Oct 89 01:35:53 GMT References: <1080@mipos3.intel.com> Reply-To: johnl@esegue.segue.boston.ma.us (John R. Levine) Organization: Segue Software, Cambridge MA Lines: 66 In article <1080@mipos3.intel.com> jpoon@mipos2.intel.com (Jack Poon~) writes: >Could any experts out there educate me WHY and HOW does self-modifying code >use? What the advantage of using self-modifying code that non-self-modifying >code cannot achieve? Depending on your point of view, either every modern computer uses self-modifying code, or almost nobody uses it any more. One of the great conceptual breakthroughs that made the modern computer possible was to store instructions and data in the same memory, so that instructions could create and modify other instructions. (I believe it was due to Von Neumann, but Babbage may have preceded him.) In ancient days, before about 1954, there were no index registers or even indirect addressing, so the only way you got a computer program to do anything interesting like add up all of the elements in an array was to have the program patch itself. For example, if your array started at location 100, the first instruction in the loop would be something like "ADD 100" which added the contents of location 100. Next you wanted to add location 101, so you'd do that by adding 1 to that ADD instruction, changing it to "ADD 101", and so forth. There were, as you might expect, a whole slew of clever tricks involving diddling instructions. Since the advent of index registers, rather than modifying the add instruction in memory, one uses an index register or indirect address word which modifies the effect of the instruction without having to modify the instruction in memory. The bad side of modifiable code is that a buggy program that is trying to change data will change code instead, which can lead to some truly spectacular wipeouts. (Sometimes they completely clear memory, so there is no trace of the program at all!) There are performance problems associated with self-modifying code as well. Most computers fetch instructions considerably ahead of where they are executing; even the lowly 8086 can fetch up to 6 bytes ahead of where it is executing. This means that if you store into the next instruction, the CPU might or might not already have fetched that instruction, so it might execute the old instruction or the new one, depending on such things as interrupts, DMA, and even register contents. Needless to say, this kind of bug is very hard to find. Experience suggests that in most cases, the possible damage from bugs smashing code outweighs the possible gain from allowing modifications, particularly since indexing and indirect addressing solve the problems most often addressed by code modification. Most computers now make the code read-only, so while a particular program is running it is forcibly prevented from modifying itself. There are a few places where instruction modification is still used. In some cases where programs dynamically link to libraries, the "call" instructions that refer to library routines are modified by the dynamic linker to point to wherever the routine happens to have ended up. High-performance sort programs that run on mainframes invariably write the inner comparison loop for the sort at the time the sort starts, so they don't have to reinterpret the sort specifications over and over. Incremental compilers such as are commonly found in Lisp systems compile a function to code at the time it is first called, install that code into the running process, and sometimes even patch call instructions to point to it (like the dynamic linker.) Some people might claim that compiling a program to object code and then loading and executing it counts as instruction modification. After all, the compiled code wasn't there when the computer started up, somebody must have modified what was there, so we all modify code all the time. I find this definition a little precious, and prefer to consider code to be self-modifying only in the case where the modifier and modifiee are part of the same program. I leave the definition of "same program" up to you. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl Massachusetts has over 100,000 unlicensed drivers. -The Globe