Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!clyde.concordia.ca!uunet!microsoft!alonzo From: alonzo@microsoft.UUCP (Alonzo GARIEPY) Newsgroups: comp.sys.handhelds Subject: Re: (long) Re: SAD - Saturn Disassembler Beta 1.01 Message-ID: <57461@microsoft.UUCP> Date: 17 Sep 90 10:12:19 GMT References: <10712@life.ai.mit.edu> <57448@microsoft.UUCP> <10753@life.ai.mit.edu> Reply-To: alonzo@microsoft.UUCP (Alonzo GARIEPY) Organization: Microsoft Corp., Redmond WA Lines: 152 In article <10753@life.ai.mit.edu> bson@rice-chex.ai.mit.edu (Jan Brittenson) writes: > >1. All hex numbers should be preceded by the # symbol, including the > > machine code and addresses in listing files. > Why? They are not part of any assembler syntax and can only be in > hex anyway. That isn't exactly true. You forget that I wrote the first assembler and disassembler using these mnemonics :-). The ideas was that ALL numbers in hex have a # in front. That makes life much easier for the lexical front end to understand what it is getting. Since my program is reversible, the listing files are completely compatible with input files (code for assembly, hex for disassembly). > Almost all my assembler experience is > from Macro-11 (DEC pdp-11), Z80-8080-68xx, as(1) for various > processors, as well as earlier versions of Macro-32 (aka VAX-11 > Macro). # strikes the `immediate mode' cord in me. Our experience is the same. Nice to meet another DEC mini guy. (Help! I'm trapped in a segmented architecture!) Since the Saturn doesn't have anything you can really call addressing modes, my mnemonics use the . extension on instructions for both length and addressing information and I have used the # character for radix. Certainly, the # is optional for numbers less than 10, but you wouldn't leave it off of something you *know* is hex like machine code. > >3. Nonstandard symbols are enclosed in quotes and can contain any > > characters. Example: "Garbage Collector!" When used as labels, > > the colon goes outside the quotes. > > Hmm... I agree, although I'd prefer to see the quotes as part of > the symbol name itself. The result is the same, of course. I think > enclosure within | and | looks better, and I'm used to it from Common > Lisp. It's not a big issue for me, though. The vertical bar should be reserved for C-like bitwise operations. And I realized that I would prefer to use quotes for string literals. How about enclosing nonstandard symbols in < and >, and you can think of them as part of the symbol. I would prefer that such symbols were alphabetized by their second character, however :-). > > #xxxxx: symbol: > > symbol=#xxxxx > This is a very good idea; it will immediately be added. The > undefined-but-known symbol listing will be in the end, to avoid having > to rewrite the disassembler from a simple 1-pass to a 2-pass one. There are major wins to having a two pass disassembler, but synchronization errors are inevitable. Notice the # and : in the above syntax. These are exactly how my listing files look and the format expected by the disassembler and assembler. Perhaps a luxury now... > > >7. Comments should not be used for machine readable information. Thus the > > syntax [<#xxxxx>] should be replaced with the simpler #xxxxx. > > I can agree to this. However, there are further considerations. To > be added is a tool to extract comments from disassemblies and merge > them with, or supersede, the contents of the comment database. > Therefore there must be a syntactically explicit way to distinguish > user-supplied comments from disassembler generated. As it stands now, > comments to be ignored are all characters enclosed within [...] and > found at the tail end of a comment. The disassembler already adheres > to this, as you've noticed. How about a second semicolon? That restricts the contents of comments a little, but so does use of square brackets. I have never been big on the use of comments for machine readable info, especially since we are working on this from the beginning and don't need to hack. Let's think on this one some more. It doesn't matter in the meantime. > >8. The data pseudo op is used to put non instructions into the code. > > Your disassembler should use the data op for anything that is not > > a valid instruction. > I have a pondered a slightly different approach to this. > Instructions will get disassembled as far as far as possible, any > unidentifiable parts replaced with "*" or "***" or such, moved into > the comment field (within [...]) and replaced with a proper pseudo op. I was assuming a multi-pass disassembler. Generally, you can figure out whether the location after an unconditional jump is code by what refers to it before or after that point. Certainly, if you are dis- assembling an instruction stream containing no jumps and you find an illegal instruction, the whole stream is invalid. But the Saturn instruction set covers enough ground that illegal instructions aren't that common. > How about: > Subexpression (substring), can contain > any characters including new lines. What are nnnn, mm, and oo? > x^expr Hex > d^expr Decimal > o^expr Octal > i^expr Code > l^expr Literal > where ^ can be freely substituted with ' (i.e. x'expr etc). > move.p5 i^, c ; C = 0x07000 > move.p5 l^, c > push.a c > ret > l^ and i^ significantly aid writing complex macros like structured > loops (i.e. WHILE, FOR...). > > Example (Macro-11 style): > > .macro while test, body > call l^test > if_failed_branch_to_L17 > call l^body > L17: .endm ;; while I like this stuff, but I think the syntax could be improved. I would prefer that string literals go in quotes. Special lexical functions can be used for substrings, code, etc. These are used infrequently enough (in rare macros, like the above), that clarity is more important than brevity. Why don't we give it a think... I still lean toward some existing portable macro processor. For radix I would prefer to stick with #, with #o for octal. Binary? Hmm... > > If anyone feels like adding RPL support, please feel free to do so, > I have absolutely no info on how composite objects (programs, > algebraics, lists, etc) are stored, and it seems no one really knows > (except HP), either. Actually, this is well understood. I have added the basics to my disassembler. Since it is written in Prolog, it has run up against the limits of the language and the compiler (Turbo). I have been meaning to write an RPL/Saturn disassembler in C, but I haven't had the time and the honour is now yours. The RPL part is trivial and indispensable to understanding the working of ROM. The tougest part of these things is coming up with the symbol/comments database, but much of Eric Toonen's HP 28 ROM map can be adapted for the 48. I can give you the basics over the phone (I haven't had time to write it up), and there are some other RPL adventurers who should be able to help. Part of the reason that not much has been written about RPL is that the format is extremely simple but documenting all the entry points is laborious. The instruction set is more complex than RPL, but there are fewer instruction types than ROM entry points. All you need is the structure of RPL objects and all the difficult stuff goes into a database. One of my ideas was to start with a simple database of RPL addresses and, using huge bitmaps to represent all the nibbles in the machine, gradually mark the locations of all known assembler and RPL. This program wouldn't necessarily disassemble in order, but would always go for a known location, eventually disassembling the entire ROM. Now that you have written the disassembly engine in C, the rest won't be that much work.