Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!world!esegue!compilers-sender
From: tmsoft!mason@uunet.UU.NET (Dave Mason)
Newsgroups: comp.compilers
Subject: Disassembly
Keywords: disassemble
Message-ID: <jf4s8i6qf@tmsoft.UUCP>
Date: 15 Sep 90 18:21:03 GMT
References: <9009091032.1.139@cup.portal.com>
Sender: compilers-sender@esegue.segue.boston.ma.us
Reply-To: tmsoft!mason@uunet.UU.NET (Dave Mason)
Followup-To: comp.compilers
Organization: TM Software Associates, Toronto
Lines: 27
Approved: compilers@esegue.segue.boston.ma.us

In article <9009091032.1.139@cup.portal.com> phorgan@cup.portal.com writes:
>The problem with disassembling arbitrary object code is that data bears a
>disturbing resemblance to code at times:) Even when running through code
>disassembling starting at known code, it's not always possible to
>determine when code stops and data begins.

As others have said, you must walk the code graph to determine all
possible code sections.

What I've done several times is to have an auxiliary file that contains hints
to the disassembler.  This contains known addresses, useful names for same,
and the class of object, e.g:
	ffd0	CONS_RCV	word
	ffd2	CONS_STS	byte
	d000	START		code
	d123	DISPTCH		jump-table 16
	d456	GETCHAR		code
	d678	HELP_MSG	ascii

So you run the disassembler against the input and this file, then examine the
output, discover more about the program, fill in more labels, and iterate.

This can produce a pretty good version of a program.  Particularly
useful for ROMs.
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.