Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker!ai-lab!rice-chex!bson
From: bson@rice-chex.ai.mit.edu (Jan Brittenson)
Newsgroups: comp.sys.handhelds
Subject: (long) Re: SAD - Saturn Disassembler Beta 1.01
Message-ID: <10753@life.ai.mit.edu>
Date: 17 Sep 90 02:35:38 GMT
References: <10712@life.ai.mit.edu> <57448@microsoft.UUCP>
Sender: news@ai.mit.edu
Organization: nil
Lines: 195

In article <57448@microsoft.UUCP> 
   alonzo@microsoft.UUCP (Alonzo GARIEPY) writes:

 >1.  All hex numbers should be preceded by the # symbol, including the
 >    machine code and addresses in listing files.

   Why? They are not part of any assembler syntax and can only be in
hex anyway. Also, there is something to be said for the output looking
much like any other disassembly. Almost all my assembler experience is
from Macro-11 (DEC pdp-11), Z80-8080-68xx, as(1) for various
processors, as well as earlier versions of Macro-32 (aka VAX-11
Macro). # strikes the `immediate mode' cord in me.

   To me a good assembler has DEC Macro-11 syntax, DEC Macro-10
literal capabilities, and certain features of Assembler XF (S/370) -
most notably macro addresses.

   I have used the pdp-11 MADAS disassembler a great deal. I once also
used a very good disassembler for cp/m-80 called `ZSOURCE' or some
such, published in Dr Dobb's if memory serves me.

   Take note that SAD 1.01 only formats values greater than or equal
to #Ah as #xxxx: 0-9 is formatted as plain 0-9. I don't think the
following would look particularly good or readable:

#05b79   stralloc:
#05b79 4      sethex
#05b7b #xxx   add    c,c

etc.

The addition was made while writing this message.


 >3.  Nonstandard symbols are enclosed in quotes and can contain any
 >    characters.  Example:  "Garbage Collector!"  When used as labels,
 >    the colon goes outside the quotes.

   Hmm... I agree, although I'd prefer to see the quotes as part of
the symbol name itself. The result is the same, of course. I think
enclosure within | and | looks better, and I'm used to it from Common
Lisp. It's not a big issue for me, though.

Ex.
|allocate-string|:
	call.3	|c=free_mem|


>4.  All symbols used as arguments should be defined in the disassembly
>    (controllable by command line option). The simplest way to define
>    an address symbol is    
>    	#xxxxx: symbol: 
>    Andreas also supports assignment for address and nonaddress symbols
>  	symbol=#xxxxx  

   This is a very good idea; it will immediately be added. The
undefined-but-known symbol listing will be in the end, to avoid having
to rewrite the disassembler from a simple 1-pass to a 2-pass one.
Perhaps it's inevitable, but I'll wait with it, and in case I do
rewrite it to full 2-pass, I'll move the symbol definitions up front.

   Implementationally, it will be accomplished by accumulating cross
referencing info, which is a good thing anyway.


>6.  Whether arguments are expressed in hex and commented with symbol names,
>    or expressed symbolically and commented with hex equivalents, should be
>    controllable by a command line switch.

   Another good idea, not a very important one, but one which will
certainly get added.


 >7.  Comments should not be used for machine readable information.  Thus the
 >    syntax [<#xxxxx>] should be replaced with the simpler #xxxxx.

   I can agree to this. However, there are further considerations. To
be added is a tool to extract comments from disassemblies and merge
them with, or supersede, the contents of the comment database.
Therefore there must be a syntactically explicit way to distinguish
user-supplied comments from disassembler generated. As it stands now,
comments to be ignored are all characters enclosed within [...] and
found at the tail end of a comment. The disassembler already adheres
to this, as you've noticed.

   The disaambler generated comments aren't for machine readability;
rather they're there for user readability, although they need a
specific syntax. Perhaps a command line option is the best solution.

   I have removed the <#xxxx> format while writing this message. It's
always #xxxx now. I don't know exactly how this came to be.

 >8.  The data pseudo op is used to put non instructions into the code.
 >    Your disassembler should use the data op for anything that is not
 >    a valid instruction.

   I have a pondered a slightly different approach to this.
Instructions will get disassembled as far as far as possible, any
unidentifiable parts replaced with "*" or "***" or such, moved into
the comment field (within [...]) and replaced with a proper pseudo op.
The nontrivial part is cluttering the code (which I'm really
ambivalent towards) with large amounts of tests. Perhaps an easier
approach is simply to a post disassembly check to see whether the
instruction contains any *'s and in that case take suitable action
(forcing into another formatting mode than Code).

   A third database - Formats - consisting of format directives, will
be added and maintained much like Symbols and Comments. But as things
stand right now, I'm merely interested in verifying the operation
through extensive testing (I discovered several bugs when grinding
through the 0000-4000 range, where numerous constants are stored and
the I/O page is mapped).
				* * *

How about:

	<nnnn mm oo>		Subexpression (substring), can contain
				any characters including new lines.
	x^expr			Hex
	d^expr			Decimal
	o^expr			Octal
	i^expr			Code
	l^expr			Literal

where ^ can be freely substituted with ' (i.e. x'expr etc).


Examples.

	move.p5	  i^<push.a c>, c	; C = 0x07000

	move.p5	  l^<add c,c\n ret>, c
	push.a	  c
	ret

   l^ and i^ significantly aid writing complex macros like structured
loops (i.e. WHILE, FOR...).

Example (Macro-11 style):

	.macro	while	test, body
	call	l^test
	if_failed_branch_to_L17
	call	l^body
L17:	.endm	;; while

More clever schemes can be made up.


   A full syntax specification of data formats is necessary to
implement a Formats database. The Formats database would also take
care of synchronization by explicitly inserting "code" tags, although
the need shouldn't arise, really. Also, symbols are typed, although
the typing is ignored and unused, as either "code" or "data."

   Internally, SAD keeps two tables: one for symbols and one for
comments (comments are analogously typed). Adding a third for formats
is fairly trivial - the nontrivial part is extracting formatting info
from a listing.

				* * *

   If anyone feels like adding RPL support, please feel free to do so,
I have absolutely no info on how composite objects (programs,
algebraics, lists, etc) are stored, and it seems no one really knows
(except HP), either. Wouldn't life be easier were HP simply to release
the appropriate documentation?

				* * *

The work cycle as I visualize it:

	a.   Disassemble area
	b.   Edit listing
	c.   Extract comment info and supersede/merge with Comments
	c.   Extract symbol info and supersede/merge with Symbols
	d.   Continue steps a-c until happy with the results
	
   A Gnu Emacs mode is much desired to automate the cycle above, with
C-c commands to operate on both buffer and region levels.

				* * *

   Note: it took me two night hacks to write the disassembler and get
it to produce a fairly error-free output. That time does not even
begin to compare with hand disassembly, and basically, for every hour
devoted to improving the disassembler, many many hours of boring
manual work is saved.

   Part of the supersede/merge idea is that people should be able to
e-mail each other disassemblies, and with a minimum of pain merge
others' with their own.

   I would appreciate hearing how you who read this react to the
ideas expressed. Post or mail.