Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!apple!metaphor!cronos.metaphor.com!tam
From: tam@cronos.metaphor.com
Newsgroups: comp.unix.programmer
Subject: Very useful binary file analyser to share
Message-ID: <1444@metaphor.Metaphor.COM>
Date: 20 Sep 90 00:06:35 GMT
Sender: news@metaphor.Metaphor.COM
Reply-To: tam@cronos.metaphor.com ()
Organization: Metaphor Computer Systems, Mountain View, CA
Lines: 354


I had developed an utility that had been extremely useful in my last few years
as a software developer. I would like to share it with you now. However I don't
know a good way to distribute binaries (sorry, I don't want to give away
sources). I have include here a user guide and if any of you are interested,
let me know how to send it to you. I have versions on the SunOS, AIX, DOS and
OS/2 (it will run on others that I have access to), please specify which one
you want.


ANA Command summary
Prepared by: Paul C. Tam
For Version 0.15
Printed on 24 July 1990

    I was rushing to finish this document, some parts may be confusing. I
    appreciate any comment or enhancement of this document. This version of
    ANA are free, please feel free to copy.

0   HIGHLIGHTS

    * Interpret binary data in structures YOU defined.
    * Rearrange data bytes before interpretation.
    * Report current machine data types.
    * Dump binary data in very flexible format.
    * Dump multiple files in same screen.
    * Same user interface across various platforms.
    * Built in calculator/converter.
    * Save output to disk to future use.
    * Search for patterns.
    * Execute Operating System command with exit utility.
    * And more......


1   INTER-OPERABILITY

    Inter-operational seems to be a hot buzzword these days. This software will
    do just that. Since the software is extremely portable, there are versions
    running on almost any operating system that has a C compiler. They have
    exactly the same look and feel across all platforms.

2   Introduction

     ANA is an utility program to assist users (especailly software developer)
who are interested in ANAlyzing the binary contents of any file. This program
may be easier for users who know C since the terminology used here is C like.
     Its major function is primarily to display the hexadecimal contents of any
file interactively. On top of it, there are a lot of features built in to make
this utility more flexible and useful. Some of these features include: able to
dump the display buffer into a file, set the display length and base, pack the
display and search for combination of bytes (search has not yet been built).
     An unique feature of this utility is perhaps its ability to analyze
certain structure. This feature is especially geared for software developers.
Sometimes data files are an array of records, each record contains information
of different types. For example, the data file maybe a control file of a print
queue. There are a number of records in there to represent the number of files
waiting to be printed. Each record in turns contains different fields, these
fields may indicate the file name to be printed, its priority and so on. They
may have data type of character (1 byte), integer (2 or 4 bytes) and ASCII
string.
     Using ANA, user can create an ASCII file in which the structure is defined.
ANA then maps the data file into the structure and intreprets them as a series
of fields instead of a string of bytes.

3    How to invoke ANA

     ANA can be invoked in any one of the following ways:

     1) ANA
     2) ANA <data_file_name>
     3) ANA <data_file_name> <start_address> <length_of_buffer>

4    Inputs

     Inputs can be of form hexidecimal, decimal or ascii. Numerical inputs are
     interpreted according to the default base, however they can be overridden
     by a prefix. Any input prefixed by 0x are always hex no matter what the
     current default state is and any input prefixed with \ are always decimal.
     Single ascii character must be between single quotes, ascii string,
     however, must be between a pair of delimitor which can be any characters.
     e.g. command s strings is the same as command s 'tring', they both search
     for "tring".

5    Report

     Unpacked -
 0x00000000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F |................|
 0x00000010: 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F |................|

     Packed -
 000102030405060708090A0B0C0D0E0F
 101112131415161718191A1B1C1D1E1F

     The report format is fairly flexible. Report address, report data can be in
either hexadecimal or decimal. The above format can vary depends on a
number of parameters, these parameters can be set by various commands. However,
the following are the default parameters unless otherwise overridden by their
corresponding commands.

     Parameters     Defaults       Commands
     Pack Mode      Unpacked       p (packed)
     Address Base   Hexadecimal    b a (base)
     Data Base	    Hexadecimal	   b d 
     Buffer size    240 bytes      l (length)
     Report width   16 bytes       w (width)
     Start address  0              a (address)


6    Command Descriptions

     6.1  ? - Help
          Display a brief description of commands available. This is useful for
          commands review.

     6.2  ENTER - Display next buffer
          Data is read from file into the data buffer and displayed. Then the
          next starting address is updated so that the next ENTER will display
          the following data.

     6.3  l - Set new buffer length
          Define the size of the data buffer on the next display.

     6.4  a - Set new starting address
          Define a new starting address of the data file rather than the
          continuation of the last display buffer.

     6.5.1  b a - Toggle report address base
          In the unpack mode, the address of the first byte of each report line
          is shown. This address can be of base hexadecimal or decimal. This
          command toggle the base.

     6.5.2  b d - Toggle report data base
          Data reported can be of base hexadecimal or decimal. This command
	  toggle the base.

     6.5.3  b i - Toggle input base
          All numerical inputs are interpreted on the current default base, this
	  command toggle the base. However, inputs prefixed with 0x are always 
	  interpreted as hex and inputs prefixed with \ are always decimal.

     6.6  c - Calculator functions
          Sometimes it is necessary to do some arithmatic operations on the data
          displayed. A simple set of arithmatic functions are available in ANA.
          Currently, the calculator can only do integer arithmatic and is
          limited to two operands and one operator (with only one exception for
          conversion). The syntax of this command is the command keyword
          followed by the operation followed by an ENTER. The following are
          examples and descriptions of all available operations. Suppose X and
          Y are two integers.

               c X * Y   ( X multiply Y )
               c X / Y   ( X divided by Y )
               c X + Y   ( X plus Y )
               c X - Y   ( X minus Y )
               c X % Y   ( reminder of X divided by Y )
               c X & Y   ( X bit and with Y )
               c X | Y   ( X bit or with Y )
               c X ^ Y   ( X bit xor with Y )
               c X > Y   ( X right shift Y bits )
               c X < Y   ( X left shift Y bits )
               c X       ( X can be hex, decimal or ASCII )


     6.7  d - Download structure description file
          Each structure description file maps only one structure, sometimes it
          is desirable to map data to a different structure. This command loads
          another descritpion file for the next mapping.
     6.8  D - continuously dump
	  The whole work file starting at current location will be displayed 
	  continuously until the end of the file.

     6.9  i - information desk
	  This command display valuable information. Information includes the
          data types in bytes of current machine, current work file name,
          number, size, maximum work file allowed to open, number of work file
          currently opened and the user input base, report data base and report
          address base. Also the mapping alignments (read m command).

     6.10 m - Map data to structure
          Maps the data in the data buffer just displayed into the structure
          described by the SDF. Mapping currently starts at the beginning of
          the data buffer, therefore user may have to adjust the starting
          address before the mapping.
	  Data type will normally be aligned in a structure. For example, a
          'short' after a 'char' will be put in even boundary and the byte after
          the 'char' is meaningless. This utility will allow user to specify its
          alignment boundary. The arguments are i for int, l for long, f for
          float and d for double. Their defaults values are displayed in
          information desk ('i'). 

     6.11 o - open another work file
	  More than one file can be worked on, this command open another work
          file.

     6.12 p - Toggle packed display mode
          As discussed above, report format can be either packed or
          unpacked, this command toggle this format.

     6.13 q - Quit analyzer
          Terminate and exit program.

     6.14 s - search for a pattern
	  a pattern is searched starting at current location. The pattern can be
	  a series of hex or decimal number, or an ascii string in a pair of
	  delimiters.

     6.15 t - Transfer data buffer to disk
          It is possible to store the buffer just displayed into a disk file,
	  using this command will do just that. At the first execution of this
          command, the user will be prompted for the disk file name unless it is	  entered with the command. Any subsequent transfer will be appended to
          the named file and any file name entered in the command line will be
          ignored.

     6.16 u - use a different work file
	  If there are multiple work files opened (read o command), this command
          is used to switch to a different work file.

     6.17 V - Display current version
          This command displays the current version of the software and
          copyright message.

     6.18 w - Set display row width
          Especially after changing to packed display format from unpacked
          format, usually it is desirable to display more data in one line. This
          command allows user to adjust the display width. 

     6.19 z - zap old data with new data
	  Be care when using this command, it will replace the old data at
          current location with the new data. There is no recovery from it. Data
	  like 's' command can be hex, decimal or ascii.

     6.20 ! - OS escape
	  Run a regular Operating System command.

     6.21 0 - Redisplay buffer
          Sometimes data may be scrolled off the screen, this command will
          redisplay data that was just displayed.

     6.22 + - report the next display buffer

     6.23 - - report the previous display buffer.


7    Structure mapping

     Mapping structure is one of the unique feature in this software. Rather
     than just dumping the data file in bytes, user can define a structure
     definition file (hereon called SDF) from which the data can be intrepreted
     in a more flexible way. The SDF is a pure ASCII file in which each line
     represents one data field and the whole file together defines a structure
     to be mapped.

     The way to use this feature is of the following steps:
     First, the SDF is created through any editor, this file must be named
     "ana.fmt".
     Second, display the beginning of the data structure by change the start
     address and hit ENTER.
     Finally, activate the mapping command to map the data buffer.

     Each line in the SDF represents one data type field, every line has the
     following format:

     keyword user_defined_id [length/byte_arrangement]

     All types except "string" the third optional field is for byte rearrange-
     ment. In case of data type string, a length field has to be specified
     to indicate how many bytes are in the string. The user defined name is used
     to assist user to identify the field, its content is arbitrary and is
     limited to 20 characters. Name more than 20 characters will be truncated.

     The keywords currently supported are:

          int       signed interger
          char      single character (byte)
          string    string of characters
          long      long signed integer
          short     short signed integer
          ulong     unsigned long integer
          ushort    unsigned short integer
          uint      unsigned integer
          float     floating point number
          double    double floating point number

     To allow more flexibility, it is also possible to interactively download
     a new SDF so that more than one structure can be analyzed in a data file.

     EXAMPLE:
     Suppose there is a file of employee records, each record starts with an
     employee name of 10 characters, then an employee number of type long,
     followed by his salary which is of type integer. Let's further assume that
     integer is two bytes and a long integer is four bytes. Instead of just
     dumping the data file in bytes, it is more useful to dump them in a more
     descriptive form, in this case a string, a long integer and an integer.
     SDF should look like this:

          string employee_name 10
          long employee_no.
          int  employee_salary

     When the ANA utility is executed, ANA detects the existance of the SDF,
     it will then build the internal structure and enable the mapping facility.
     User can display the portion of data that needs to be analyzed, and then
     activates the mapping command, the display may look something like:

          employee_name = (John Doe  )
          employee_no. = 999999 (0xF423F)
          employee_salary = 5000 (0x1388)

     Notice that the string is shown inside a pair of paranthesis. The integers
     are shown in decimal and have their corresponding hexadecimal values.
 
     User should be aware that computer always put data into their corresponding
     type boundary. For example, for a machine that uses a two byte integer, a
     structure like
          structure {
               character A;
               integer 1;
          }
          may be stored as follow:
               41 XX 00 01
                   or
               41 00 01

     In the first case, the character falls in the even boundary, since the
     integer has to start also at the even boundary, there is a garbage byte in
     between which does not mean anything. While in the second case, the
     character falls in the odd boundary and therefore the integer can be put
     right after the character.

     For the same reason, sometimes it is very confusing to just look at the
     data byte by byte. It is better off to use the structure mapping.

     Furthermore, different CPUs have different characteristics. Some align
     integer and long into even boundary while other might align integer in even
     boundary and long in 4 byte boundary. This software will allow user to
     customize the alignment. Command m [i/l/f/d] specify the alignment of the
     data type used in structure mapping.

     To make life harder, some CPU swap bytes (our beloved 80x86 architecture)
     and some don't. The third optional field in the format file are just for
     that. It specifies the data rearrangement sequence. For example, an integer
     made of 4 bytes is stored in a file as 0x11 0x22 0x33 0x44 (0x011223344),
     a line in the format file:
         int sample 
     yields an output of
         sample = 287454020 (0x11223344)
     but if the format file is written as:
         int sample 4321
     the output will be
         sample = 1144201745 (0x44332211)

     This feature is really useful if for example, someone tries to dump a file
created in a 68000 machine in a 8086 machine.