Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!cs.utexas.edu!tut.cis.ohio-state.edu!snorkelwacker!mintaka!spdcc!esegue!compilers-sender From: meissner@osf.org Newsgroups: comp.compilers Subject: Disassembly Keywords: assembler, debug Message-ID: <9009121606.AA26236@curley.osf.org> Date: 12 Sep 90 16:06:53 GMT Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: meissner@osf.org Organization: Compilers Central Lines: 49 Approved: compilers@esegue.segue.boston.ma.us In-Reply-To: phorgan@cup.portal.com's message of 9 Sep 90 17:32:55 GMT | The problem with disassembling arbitrary object code is that data bears a | disturbing resemblance to code at times:) Even when running through code | disassembling starting at known code, it's not always possible to | determine when code stops and data begins. Then it's not possible to tell | when object code starts up again. I discovered that the MIPS assembler has a 'solution' to this problem. It doesn't prohibit you from putting constants in the text section, but if you do, the line numbers for debugging are messed up. I found this when I had GCC putting the switch label array in .text. The MIPS people I talked to about this said it was a feature, and not a bug.... Thus on a MIPS system, you don't have to worry about data being in the text section.... (and of course each instruction is exactly 32-bits, so you don't have to worry about starting in the middle of an instruction, like you do on CISC machines. | This is easy to see using most | dissassemblers; when you hit the data, the unknown op-code indicator | appears (typically ???), then random sequences of ??? and op-codes, then | when the code starts, often the disassembler has just guessed wrong and | includes the first byte or two of the 'real' op-code in a previous 'false' | one. It might take a while to 're-synchronize' and start showing 'real' | op-codes. The only time this isn't a problem would be with fixed single | length op-codes with an alignment requirement. It is possible to reduce | the problem with an algorithm that looks ahead starting byte-by-byte and | sees which one generates a most successful string of instructions. From a | 'good starting byte', you could disassemble in reverse to find a previous | starting location. You could always do a complete scan of the text, using a bitmap or some such to identify every place that has an instruction. It would have to be a backtracking scan, so that you can mark both the fall through case, and conditional branch target cases. IMHO though, it would be too slow and consume too much memory to be useful. | Even this fails in many cases of self modifying code | or in cases where strange things are done like overlapped code. ... Fortunately for this case, self modifying code seems to be mostly on the decline, and only used where needed (or because you have a macho hacker type that thinks self modifying code is neat). -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142 -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.