Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!world!esegue!compilers-sender From: raulmill@usc.edu Newsgroups: comp.compilers Subject: Re: Help on disassembler/decompilers Keywords: disassemble Message-ID: <9009170028.AA24570@girtab.usc.edu> Date: 17 Sep 90 00:28:25 GMT References: <12976@june.cs.washington.edu> <_5A%GS%@rpi.edu> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: raulmill@usc.edu Organization: Compilers Central Lines: 56 Approved: compilers@esegue.segue.boston.ma.us In-Reply-To: adamsf@turing.cs.rpi.edu's message of 10 Sep 90 22:20:33 GMT In article <_5A%GS%@rpi.edu> adamsf@turing.cs.rpi.edu (Frank Adams) writes: In article <12976@june.cs.washington.edu> pardo@cs.washington.edu (David Keppel) writes: >My guess is that decompiling in to a language that is e.g., >saccarine-sweetened assembler (C) is `easy', while decompiling e.g., >in to APL is hard. If we assume that the program is to be decompiled into the language in which it was written, it is in general easier to decompile the less the compiler optimizes the generated code. A second problem is type inference. APL, with a fixed set of data types, is easier in this respect than C. For example, when the code loads a pointer into a register and indexes off of it, what kind of struct is the pointer pointing to? [Frank then goes on to state his opinion that C is pretty good for exact transliteration of machine language.] If I may point out... [1] the first commercial use of APL was to describe the IBM 360 architecture. APL has the ability to concisely describe just about any machine architecture. [2] As far as I know, the language analysis/verification tools available for APL are pretty good [some would say better than those available for any other language, but without first hand knowledge I'm not so sure. I do know that 7 or 8 years ago 3 bugs were found in that 360 description by one of these verifiers.] If you want an exact HLL transliteration of raw machine code, or a translation into an assembler-like language, there is no reason why APL should be harder than any other language (though I'd recommend using J instead, because there is an odd sort of problem getting APL to talk in ascii, and J is better IMHO :) To turn back to the original poster's question, the best disassemblers I have seen often do a lot of interpolation based on system calls whose arguments are known, various compiler conventions and, if you are lucky enough, linking/debugging information left in the code by the developers. As far as I've seen, the worst problem in converting from machine language to other representations is figuring out what to call a specific piece of memory. (code? text? struct? etc.) A lot of this information can be interpolated by logic on the order of 'well, if this instruction is illegal, we know everything back to the last branch isn't instructions.' [It seems to me that [1] is a red herring, the IBM POO describes the 370 in English, but disassembling into English is difficult. On the other hand, decompiling into scalar APL expressions shouldn't be hard. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.