Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!microsoft!alonzo
From: alonzo@microsoft.UUCP (Alonzo GARIEPY)
Newsgroups: comp.sys.handhelds
Subject: Re: (long) Re: SAD - Saturn Disassembler Beta 1.01
Message-ID: <57461@microsoft.UUCP>
Date: 17 Sep 90 10:12:19 GMT
References: <10712@life.ai.mit.edu> <57448@microsoft.UUCP> <10753@life.ai.mit.edu>
Reply-To: alonzo@microsoft.UUCP (Alonzo GARIEPY)
Organization: Microsoft Corp., Redmond WA
Lines: 152

In article <10753@life.ai.mit.edu> bson@rice-chex.ai.mit.edu (Jan Brittenson) writes:
>  >1.  All hex numbers should be preceded by the # symbol, including the
>  >    machine code and addresses in listing files.
>    Why? They are not part of any assembler syntax and can only be in
> hex anyway. 

That isn't exactly true.  You forget that I wrote the first assembler and
disassembler using these mnemonics :-).  The ideas was that ALL numbers
in hex have a # in front.  That makes life much easier for the lexical
front end to understand what it is getting.  Since my program is reversible,
the listing files are completely compatible with input files (code for
assembly, hex for disassembly).

> Almost all my assembler experience is
> from Macro-11 (DEC pdp-11), Z80-8080-68xx, as(1) for various
> processors, as well as earlier versions of Macro-32 (aka VAX-11
> Macro). # strikes the `immediate mode' cord in me.

Our experience is the same.  Nice to meet another DEC mini guy. (Help!
I'm trapped in a segmented architecture!)  Since the Saturn doesn't
have anything you can really call addressing modes, my mnemonics
use the . extension on instructions for both length and addressing 
information and I have used the # character for radix.  Certainly,
the # is optional for numbers less than 10, but you wouldn't leave
it off of something you *know* is hex like machine code.

>  >3.  Nonstandard symbols are enclosed in quotes and can contain any
>  >    characters.  Example:  "Garbage Collector!"  When used as labels,
>  >    the colon goes outside the quotes.
> 
>    Hmm... I agree, although I'd prefer to see the quotes as part of
> the symbol name itself. The result is the same, of course. I think
> enclosure within | and | looks better, and I'm used to it from Common
> Lisp. It's not a big issue for me, though.

The vertical bar should be reserved for C-like bitwise operations.
And I realized that I would prefer to use quotes for string literals.
How about enclosing nonstandard symbols in < and >, and you can think
of them as part of the symbol.  I would prefer that such symbols were
alphabetized by their second character, however :-).

> >    	#xxxxx: symbol: 
> >  	symbol=#xxxxx  
>    This is a very good idea; it will immediately be added. The
> undefined-but-known symbol listing will be in the end, to avoid having
> to rewrite the disassembler from a simple 1-pass to a 2-pass one.

There are major wins to having a two pass disassembler, but synchronization
errors are inevitable.  Notice the # and : in the above syntax.  These
are exactly how my listing files look and the format expected by the
disassembler and assembler.  Perhaps a luxury now...

> 
>  >7.  Comments should not be used for machine readable information.  Thus the
>  >    syntax [<#xxxxx>] should be replaced with the simpler #xxxxx.
> 
>    I can agree to this. However, there are further considerations. To
> be added is a tool to extract comments from disassemblies and merge
> them with, or supersede, the contents of the comment database.
> Therefore there must be a syntactically explicit way to distinguish
> user-supplied comments from disassembler generated. As it stands now,
> comments to be ignored are all characters enclosed within [...] and
> found at the tail end of a comment. The disassembler already adheres
> to this, as you've noticed.

How about a second semicolon?  That restricts the contents of comments
a little, but so does use of square brackets.  I have never been big on
the use of comments for machine readable info, especially since we are
working on this from the beginning and don't need to hack.  Let's think
on this one some more.  It doesn't matter in the meantime.

>  >8.  The data pseudo op is used to put non instructions into the code.
>  >    Your disassembler should use the data op for anything that is not
>  >    a valid instruction.
>    I have a pondered a slightly different approach to this.
> Instructions will get disassembled as far as far as possible, any
> unidentifiable parts replaced with "*" or "***" or such, moved into
> the comment field (within [...]) and replaced with a proper pseudo op.

I was assuming a multi-pass disassembler.  Generally, you can figure
out whether the location after an unconditional jump is code by what
refers to it before or after that point.  Certainly, if you are dis-
assembling an instruction stream containing no jumps and you find
an illegal instruction, the whole stream is invalid.  But the Saturn 
instruction set covers enough ground that illegal instructions aren't 
that common.

> How about:
> 	<nnnn mm oo>		Subexpression (substring), can contain
> 				any characters including new lines.

What are nnnn, mm, and oo? 

> 	x^expr			Hex
> 	d^expr			Decimal
> 	o^expr			Octal
> 	i^expr			Code
> 	l^expr			Literal
> where ^ can be freely substituted with ' (i.e. x'expr etc).
> 	move.p5	  i^<push.a c>, c	; C = 0x07000
> 	move.p5	  l^<add c,c\n ret>, c
> 	push.a	  c
> 	ret
> l^ and i^ significantly aid writing complex macros like structured
> loops (i.e. WHILE, FOR...).
> 
> Example (Macro-11 style):
> 
> 	.macro	while	test, body
> 	call	l^test
> 	if_failed_branch_to_L17
> 	call	l^body
> L17:	.endm	;; while

I like this stuff, but I think the syntax could be improved.  I would
prefer that string literals go in quotes.  Special lexical functions
can be used for substrings, code, etc.  These are used infrequently
enough (in rare macros, like the above), that clarity is more important
than brevity.   Why don't we give it a think...  I still lean toward
some existing portable macro processor.  For radix I would prefer to
stick with #, with #o for octal.  Binary?  Hmm...

> 
>    If anyone feels like adding RPL support, please feel free to do so,
> I have absolutely no info on how composite objects (programs,
> algebraics, lists, etc) are stored, and it seems no one really knows
> (except HP), either. 

Actually, this is well understood.  I have added the basics to my
disassembler.  Since it is written in Prolog, it has run up against
the limits of the language and the compiler (Turbo).  I have been
meaning to write an RPL/Saturn disassembler in C, but I haven't had
the time and the honour is now yours.  The RPL part is trivial and
indispensable to understanding the working of ROM.  The tougest part
of these things is coming up with the symbol/comments database, but
much of Eric Toonen's HP 28 ROM map can be adapted for the 48.

I can give you the basics over the phone (I haven't had time to write
it up), and there are some other RPL adventurers who should be able to
help.  Part of the reason that not much has been written about RPL is
that the format is extremely simple but documenting all the entry points
is laborious.  The instruction set is more complex than RPL, but there
are fewer instruction types than ROM entry points.  All you need is the
structure of RPL objects and all the difficult stuff goes into a database.

One of my ideas was to start with a simple database of RPL addresses
and, using huge bitmaps to represent all the nibbles in the machine,
gradually mark the locations of all known assembler and RPL.  This
program wouldn't necessarily disassemble in order, but would always
go for a known location, eventually disassembling the entire ROM.
Now that you have written the disassembly engine in C, the rest won't
be that much work.