Path: utzoo!attcan!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker!ai-lab!rice-chex!bson From: bson@rice-chex.ai.mit.edu (Jan Brittenson) Newsgroups: comp.sys.handhelds Subject: (long) Re: SAD - Saturn Disassembler Beta 1.01 Message-ID: <10753@life.ai.mit.edu> Date: 17 Sep 90 02:35:38 GMT References: <10712@life.ai.mit.edu> <57448@microsoft.UUCP> Sender: news@ai.mit.edu Organization: nil Lines: 195 In article <57448@microsoft.UUCP> alonzo@microsoft.UUCP (Alonzo GARIEPY) writes: >1. All hex numbers should be preceded by the # symbol, including the > machine code and addresses in listing files. Why? They are not part of any assembler syntax and can only be in hex anyway. Also, there is something to be said for the output looking much like any other disassembly. Almost all my assembler experience is from Macro-11 (DEC pdp-11), Z80-8080-68xx, as(1) for various processors, as well as earlier versions of Macro-32 (aka VAX-11 Macro). # strikes the `immediate mode' cord in me. To me a good assembler has DEC Macro-11 syntax, DEC Macro-10 literal capabilities, and certain features of Assembler XF (S/370) - most notably macro addresses. I have used the pdp-11 MADAS disassembler a great deal. I once also used a very good disassembler for cp/m-80 called `ZSOURCE' or some such, published in Dr Dobb's if memory serves me. Take note that SAD 1.01 only formats values greater than or equal to #Ah as #xxxx: 0-9 is formatted as plain 0-9. I don't think the following would look particularly good or readable: #05b79 stralloc: #05b79 4 sethex #05b7b #xxx add c,c etc. The addition was made while writing this message. >3. Nonstandard symbols are enclosed in quotes and can contain any > characters. Example: "Garbage Collector!" When used as labels, > the colon goes outside the quotes. Hmm... I agree, although I'd prefer to see the quotes as part of the symbol name itself. The result is the same, of course. I think enclosure within | and | looks better, and I'm used to it from Common Lisp. It's not a big issue for me, though. Ex. |allocate-string|: call.3 |c=free_mem| >4. All symbols used as arguments should be defined in the disassembly > (controllable by command line option). The simplest way to define > an address symbol is > #xxxxx: symbol: > Andreas also supports assignment for address and nonaddress symbols > symbol=#xxxxx This is a very good idea; it will immediately be added. The undefined-but-known symbol listing will be in the end, to avoid having to rewrite the disassembler from a simple 1-pass to a 2-pass one. Perhaps it's inevitable, but I'll wait with it, and in case I do rewrite it to full 2-pass, I'll move the symbol definitions up front. Implementationally, it will be accomplished by accumulating cross referencing info, which is a good thing anyway. >6. Whether arguments are expressed in hex and commented with symbol names, > or expressed symbolically and commented with hex equivalents, should be > controllable by a command line switch. Another good idea, not a very important one, but one which will certainly get added. >7. Comments should not be used for machine readable information. Thus the > syntax [<#xxxxx>] should be replaced with the simpler #xxxxx. I can agree to this. However, there are further considerations. To be added is a tool to extract comments from disassemblies and merge them with, or supersede, the contents of the comment database. Therefore there must be a syntactically explicit way to distinguish user-supplied comments from disassembler generated. As it stands now, comments to be ignored are all characters enclosed within [...] and found at the tail end of a comment. The disassembler already adheres to this, as you've noticed. The disaambler generated comments aren't for machine readability; rather they're there for user readability, although they need a specific syntax. Perhaps a command line option is the best solution. I have removed the <#xxxx> format while writing this message. It's always #xxxx now. I don't know exactly how this came to be. >8. The data pseudo op is used to put non instructions into the code. > Your disassembler should use the data op for anything that is not > a valid instruction. I have a pondered a slightly different approach to this. Instructions will get disassembled as far as far as possible, any unidentifiable parts replaced with "*" or "***" or such, moved into the comment field (within [...]) and replaced with a proper pseudo op. The nontrivial part is cluttering the code (which I'm really ambivalent towards) with large amounts of tests. Perhaps an easier approach is simply to a post disassembly check to see whether the instruction contains any *'s and in that case take suitable action (forcing into another formatting mode than Code). A third database - Formats - consisting of format directives, will be added and maintained much like Symbols and Comments. But as things stand right now, I'm merely interested in verifying the operation through extensive testing (I discovered several bugs when grinding through the 0000-4000 range, where numerous constants are stored and the I/O page is mapped). * * * How about: Subexpression (substring), can contain any characters including new lines. x^expr Hex d^expr Decimal o^expr Octal i^expr Code l^expr Literal where ^ can be freely substituted with ' (i.e. x'expr etc). Examples. move.p5 i^, c ; C = 0x07000 move.p5 l^, c push.a c ret l^ and i^ significantly aid writing complex macros like structured loops (i.e. WHILE, FOR...). Example (Macro-11 style): .macro while test, body call l^test if_failed_branch_to_L17 call l^body L17: .endm ;; while More clever schemes can be made up. A full syntax specification of data formats is necessary to implement a Formats database. The Formats database would also take care of synchronization by explicitly inserting "code" tags, although the need shouldn't arise, really. Also, symbols are typed, although the typing is ignored and unused, as either "code" or "data." Internally, SAD keeps two tables: one for symbols and one for comments (comments are analogously typed). Adding a third for formats is fairly trivial - the nontrivial part is extracting formatting info from a listing. * * * If anyone feels like adding RPL support, please feel free to do so, I have absolutely no info on how composite objects (programs, algebraics, lists, etc) are stored, and it seems no one really knows (except HP), either. Wouldn't life be easier were HP simply to release the appropriate documentation? * * * The work cycle as I visualize it: a. Disassemble area b. Edit listing c. Extract comment info and supersede/merge with Comments c. Extract symbol info and supersede/merge with Symbols d. Continue steps a-c until happy with the results A Gnu Emacs mode is much desired to automate the cycle above, with C-c commands to operate on both buffer and region levels. * * * Note: it took me two night hacks to write the disassembler and get it to produce a fairly error-free output. That time does not even begin to compare with hand disassembly, and basically, for every hour devoted to improving the disassembler, many many hours of boring manual work is saved. Part of the supersede/merge idea is that people should be able to e-mail each other disassemblies, and with a minimum of pain merge others' with their own. I would appreciate hearing how you who read this react to the ideas expressed. Post or mail.