Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!shadooby!ginosko!uunet!lotus!esegue!johnl
From: johnl@esegue.segue.boston.ma.us (John R. Levine)
Newsgroups: comp.arch
Subject: Re: Self-modifying code
Message-ID: <1989Oct11.013553.3893@esegue.segue.boston.ma.us>
Date: 11 Oct 89 01:35:53 GMT
References: <1080@mipos3.intel.com>
Reply-To: johnl@esegue.segue.boston.ma.us (John R. Levine)
Organization: Segue Software, Cambridge MA
Lines: 66

In article <1080@mipos3.intel.com> jpoon@mipos2.intel.com (Jack Poon~) writes:
>Could any experts out there educate me WHY and HOW does self-modifying code
>use? What the advantage of using self-modifying code that non-self-modifying
>code cannot achieve?

Depending on your point of view, either every modern computer uses
self-modifying code, or almost nobody uses it any more.  One of the great
conceptual breakthroughs that made the modern computer possible was to store
instructions and data in the same memory, so that instructions could create
and modify other instructions.  (I believe it was due to Von Neumann, but
Babbage may have preceded him.)

In ancient days, before about 1954, there were no index registers or even
indirect addressing, so the only way you got a computer program to do
anything interesting like add up all of the elements in an array was to have
the program patch itself.  For example, if your array started at location
100, the first instruction in the loop would be something like "ADD 100"
which added the contents of location 100.  Next you wanted to add location
101, so you'd do that by adding 1 to that ADD instruction, changing it to
"ADD 101", and so forth.  There were, as you might expect, a whole slew of
clever tricks involving diddling instructions.

Since the advent of index registers, rather than modifying the add
instruction in memory, one uses an index register or indirect address word
which modifies the effect of the instruction without having to modify the
instruction in memory.  The bad side of modifiable code is that a buggy
program that is trying to change data will change code instead, which can
lead to some truly spectacular wipeouts.  (Sometimes they completely clear
memory, so there is no trace of the program at all!)  There are performance
problems associated with self-modifying code as well.  Most computers fetch
instructions considerably ahead of where they are executing; even the lowly
8086 can fetch up to 6 bytes ahead of where it is executing.  This means that
if you store into the next instruction, the CPU might or might not already
have fetched that instruction, so it might execute the old instruction or the
new one, depending on such things as interrupts, DMA, and even register
contents.  Needless to say, this kind of bug is very hard to find.

Experience suggests that in most cases, the possible damage from bugs
smashing code outweighs the possible gain from allowing modifications,
particularly since indexing and indirect addressing solve the problems most
often addressed by code modification.  Most computers now make the code
read-only, so while a particular program is running it is forcibly prevented
from modifying itself.

There are a few places where instruction modification is still used.  In some
cases where programs dynamically link to libraries, the "call" instructions
that refer to library routines are modified by the dynamic linker to point to
wherever the routine happens to have ended up.  High-performance sort
programs that run on mainframes invariably write the inner comparison loop
for the sort at the time the sort starts, so they don't have to reinterpret
the sort specifications over and over.  Incremental compilers such as are
commonly found in Lisp systems compile a function to code at the time it is
first called, install that code into the running process, and sometimes even
patch call instructions to point to it (like the dynamic linker.)

Some people might claim that compiling a program to object code and then
loading and executing it counts as instruction modification.  After all, the
compiled code wasn't there when the computer started up, somebody must have
modified what was there, so we all modify code all the time.  I find this
definition a little precious, and prefer to consider code to be
self-modifying only in the case where the modifier and modifiee are part of
the same program.  I leave the definition of "same program" up to you.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
Massachusetts has over 100,000 unlicensed drivers.  -The Globe