Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!mcsun!hp4nl!sci.kun.nl!atcmpe!jc From: jc@atcmp.nl (Jan Christiaan van Winkel) Newsgroups: comp.lang.c Subject: Re: Self-modifying code Message-ID: <670@atcmpe.atcmp.nl> Date: 12 Oct 90 11:33:59 GMT References: <829@neccan.oz> Organization: AT Computing, Nijmegen, The Netherlands Lines: 49 From article <829@neccan.oz>, by peter@neccan.oz (Peter Miller): > 1. On a Z80 I wrote some code which used a NMI (non-maskable interrupt). This reminds me of the code used in the 8080 basic interpreter by Microsoft. They had several entries into the errorroutine. The errorroutine expected an errornumber in register b. Now what they had done was: ld hl, ; registerpair hl gets the value ld hl, ld hl, and so on. The 16 bit numbers themselves were actually instructions: ld b,errorcode By jumping into the middle of one of the ld hl,... instructions, they would load the errorcode in b, and then execute some dummy ld hl,... instructions. that would not globber the value in b, eventhough the ld b,xxx instructions were just a byte away. Although this is not self modifying code, it is 'shifting the bits a bit and interpreting the result'. Very clever > 4. At some point, I realized that using a compiler is rather like > self-modifying code. The compiler, itself a binary data file, chews on a > text file and makes a binary data file. When we run the program we just > compiled, we are asking the OS to load a binary data file and leap into it. Hmmm. I think you should read Ken Thompson's Turing award lecture. He dis- cussed the possibility of getting code into a C compiler, without having it in the source. The trick is illustrated with the addition of a new escaped character like \n. In the lex. analyzer there is some sort of code like this: case '\': switch(getnewchar()) { case 'n': return '\n'; case 'a': return '\007'; /* the newly added character */ /* my name's Bond, James Bond :-) */ . . Now compile the compiler, and you'll have a new compiler that recognizes '\a'. Now edit the sourcecode to look like this: case 'a': return '\a' Tghis is possible because the compiler will be compiled with the compiler that knows about '\a'. The result is a C compiler that knows that '\a' is in reality '\007', but nowhere in the source of the C compiler that knowledge is stored. It is inherited from the previous generation of the C compiler. JC -- ___ __ ____________________________________________________________________ |/ \ Jan Christiaan van Winkel Tel: +31 80 566880 jc@atcmp.nl | AT Computing P.O. Box 1428 6501 BK Nijmegen The Netherlands __/ \__/ ____________________________________________________________________