Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!swrinde!mips!samsung!olivea!decwrl!world!iecc!compilers-sender From: dwex@mtgzfs3.att.com (David E Wexelblat) Newsgroups: comp.compilers Subject: How can a disassembler tell code from data? Keywords: disassemble, design, question Message-ID: <91-05-072@iecc.cambridge.ma.us> Date: 9 May 91 16:46:10 GMT Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: dwex@mtgzfs3.att.com (David E Wexelblat) Organization: AT&T Bell Laboratories Lines: 36 Approved: compilers@iecc.cambridge.ma.us I am working on fixing a rather broken disassembler for the 680x0 series (which is irrelevant to my general problem, but may help find a specific answer). My problem is trying to disassemble code compiled with GCC, which puts constant character strings into the text segment. The program correctly figures out that this stuff is not executable code by tracing all of the paths through the code. But it cannot tell the difference between word and byte data. I think this is a general problem with disassembling any non-split-I/D program. I was wondering if there are any techniques for determining that a given piece of data should be interpreted as a character string as opposed to word data. I would like a general-case answer, but the following constraints can be applied, if necessary: 1) 680x0 processor 2) C compiler - AT&T UNIX-PC v3.51 (which doesn't generally do this) - gcc 3) COFF format object files - stripped - with symbols - with relocation - with debugging I had though about using 'strings' type algorithm, but this is prone to generating garbage, so I'm looking for something better. -- David Wexelblat | dwex@mtgzz.att.com AT&T Bell Laboratories | ...!att!mtgzz!dwex 200 Laurel Ave - 4B-421 | Middletown, NJ 07748 | (201) 957-5871 [In the absence of extensive symbol table info, this sounds like a tough problem. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.