Path: utzoo!attcan!uunet!snorkelwacker!ai-lab!rice-chex!bson
From: bson@rice-chex.ai.mit.edu (Jan Brittenson)
Newsgroups: comp.sys.handhelds
Subject: Re: (long) Re: SAD - Saturn Disassembler Beta 1.01
Message-ID: <10828@life.ai.mit.edu>
Date: 19 Sep 90 05:21:05 GMT
References: <57461@microsoft.UUCP> <10772@life.ai.mit.edu> <57546@microsoft.UUCP>
Sender: news@ai.mit.edu
Organization: nil
Lines: 129

In article <57546@microsoft.UUCP>
   alonzo@microsoft.UUCP (Alonzo GARIEPY) writes:

 >> ; [type_String:	#5bc7]
 >> ; ["c=free_mem":	#5b8a]
 >> ; [stralloc:	#5b9d	#5ba9]

   Oops! I forgot to mention, this is the result of "full cross
referencing" enabled. (I added it while I was adding the
auto-definitions since it's almost the same thing.) The addresses are
the addresses where the symbols were referenced.

 >So the above symbol definitions can be done as:
 >
 >	#5bc7: type_String:
 >	#5b8a: <c=free_mem>:
 >
 >This is how my assembler/disassembler works and is convenient.
 >With this setup, you can run the output of the disassembler through
 >the assembler and vice versa.

   OK. There is a slight glitch here. `sad -z` will make things look
the way you want to (I'll just stick in a colon, the rest is there).
However, for xsym to extract symbol info from edited listings you will
have to use the `normal' sad format, edit, xsym, and redo it with -z
when you want yours.

   I will also change comment formats into the additional semicolon as
you proposed.

Xsym exists as of yesterday, and works well.

   I'm working on xcom - the comment extraction tool, right now; it's
much more trivial than xsym, and should work soon. (Xsym after all was
only one night hack to write, with liberal cloning/mutations of
existing SAD code.)

 >> Unless "what" is data... Things like jump tables really don't fit
 >> into this either. As I see it, there is really no way to (easily and
 >> quickly) tell, except for the person doing the disassembly, using the
 >> disassembler as a tool.

 >I don't really follow.  Seems like there are heuristics that get all
 >but pathological cases.

   I looked a little at the lower part of the ROM, where interrupts,
power-on, and stuff is handled. There is code that jumps and about and
sets P to various values. Then it picks up some nibble at various
places, sticks it in nibble P in C, adds some constant, and puts in 7
or F in the 4th nibble (as per location 11f) depending on the current
user RAM mapping. Then it jumps indirect C.

I would love to see the program to keep track of what's going on...

			* * *

Now on to assembler design...

>> Clarification:
>> 	<any <sequence> of\tcharacters>

>Still not quite clear to this person...

Further clarification:

	symbol=<caf>
	move.p4	#<symbol>, c	; c = #caf

   So, the operator # would return its argument interpreted according
to its first character. There could be two instances of the # operator,
one FX and one XFY version.

   2#110001 would be the XFY operator # called with two arguments: <2>
and <110001>, returning <49>. All arithmetic in strings. The digits 0-9
would be FX operators. I.e. <49> would call the FX operator 4 with the
argument <9>. Notice that <4<9>> would call the FX operator 4 with the
argument <<9>>, which 4 could choose to recognize or not. In the assembler
input

	move.p5 i^<add c,c>, c

first the FX operator i^ would be called with <<add c,c>>. It would
recursively call the assembler to assemble <add c,c>, and return the
resulting code as <198> which is the decimal equivalent of #c6. So 
after macro and expression expansion, the assembler will be left to
assemble

	move.p5	198, c

Operators could be added or overlaid by using a special macro form.

   l^ could be an FX operator to resursively assemble the argument and
stick it into a literal area, returning the address. 

   If we assume that the pseudo-instruction "ascii" inserts its
argument as byte-length characters, then,

	ascii	<add c,c>	; would result in "add c,c", whereas
	ascii	i^<add c,c>	; would result in "198", and
	ascii	#i^<add c,c>	; would result in "c6".

   The notation "foo" could be interpreted as <foo>, by brutally
absorbing all characters up to the next quote. In fact " could be
implemented as an FX operator which would brutally collect characters
until it encounters its twin.

   Oh, BTW, 4-5 years ago I wrote an ns32k assembler that worked like
this, in DECUS C. (Later ported to System V cc.) All the operator and
expression stuff can be reused, together with a stripped-down syntax
check. (No modes, like you said. :-)) It has the curious ability of
implementing modes that do not exist, like autoincrement by adding
"inc" instructions as appropriate. These are hard-wired, but new ones
can easily be added. I believe I have it on a tape here... will check
into the possibility of rehacking it into a Saturn assembler instead.
A later project, though. (It doesn't recognize operator arities or
anything fancy, but such quiche is only for Pascal programmers anyway,
don't you agree? ;-))

   Another idea is to make a gas implementation for the Saturn. This
would probably be bad news for DOS users, since it would require an
ld(1).
				* * *

 > This is great work you are doing!  You might consider making it shareware.

Thanks! What's shareware?

						-- Jan Brittenson
						   bson@ai.mit.edu