Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!samsung!think.com!mintaka!spdcc!esegue!compilers-sender From: Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) Newsgroups: comp.compilers Subject: Re: Help on disassembler/decompilers Keywords: assembler, debug Message-ID: Date: 10 Sep 90 13:04:01 GMT References: <6839.26ea3b0e@vax1.tcd.ie> Sender: compilers-sender@esegue.segue.boston.ma.us Reply-To: Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) Organization: NCR Microelectronics, Ft. Collins, CO Lines: 41 Approved: compilers@esegue.segue.boston.ma.us >>>>> On 9 Sep 90 12:52:29 GMT, rwallace@vax1.tcd.ie said: > There's no unique mapping from machine code to HLL and hence > (unlike machine code to assembler) no simple algorithm (your > algorithm might recognize something it thinks is a loop but is it > a for loop, a while loop or just something hacked together with > gotos? Very true, but it doesn't matter whether it uses "for" or "while" loops or a combination based on heuristics. (Turning a mess of "goto"s into "for" or "while" loops sounds particularly attractive!) Recreating the original code _exactly_ is overkill; it's the _algorithms_ one generally wants to see. The more the decompiler is able to abstract "goto"s into "while"s, "for"s, "do...while"s, etc. the better, even if it doesn't match the original. Further heuristics could be used for combining conditionals to avoid 3 "if"s when a single one would do, finding candidates for "switch" statements, etc. Best of all: Combined with an optimizing compiler, you get a kluge source code optimizer, and at least open the possibility of porting optimizations to a platform with a less effective optimizer. In some cases, it may even make obvious new optimizations of the underlying algorithm. (Just the tool to have around when optimizing highly time critical subroutines. ;-) Structure offsets and stack variable names do present a particular problem, which may be _partially_ overcome by reading the symbol table (if it exists) and annotating accordingly. Patching your compiler to include additional information in the object file or a side file can further the possible abstraction. In the the very worst case, you get to read cryptic C instead of cryptic assembler. Cheers, -- Chuck Phillips MS440 NCR Microelectronics Chuck.Phillips%FtCollins.NCR.com 2001 Danfield Ct. Ft. Collins, CO. 80525 uunet!ncrlnk!ncr-mpd!bach!chuckp -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.