Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sun-barr!apple!metaphor!cronos.metaphor.com!tam From: tam@cronos.metaphor.com Newsgroups: comp.unix.programmer Subject: Very useful binary file analyser to share Message-ID: <1444@metaphor.Metaphor.COM> Date: 20 Sep 90 00:06:35 GMT Sender: news@metaphor.Metaphor.COM Reply-To: tam@cronos.metaphor.com () Organization: Metaphor Computer Systems, Mountain View, CA Lines: 354 I had developed an utility that had been extremely useful in my last few years as a software developer. I would like to share it with you now. However I don't know a good way to distribute binaries (sorry, I don't want to give away sources). I have include here a user guide and if any of you are interested, let me know how to send it to you. I have versions on the SunOS, AIX, DOS and OS/2 (it will run on others that I have access to), please specify which one you want. ANA Command summary Prepared by: Paul C. Tam For Version 0.15 Printed on 24 July 1990 I was rushing to finish this document, some parts may be confusing. I appreciate any comment or enhancement of this document. This version of ANA are free, please feel free to copy. 0 HIGHLIGHTS * Interpret binary data in structures YOU defined. * Rearrange data bytes before interpretation. * Report current machine data types. * Dump binary data in very flexible format. * Dump multiple files in same screen. * Same user interface across various platforms. * Built in calculator/converter. * Save output to disk to future use. * Search for patterns. * Execute Operating System command with exit utility. * And more...... 1 INTER-OPERABILITY Inter-operational seems to be a hot buzzword these days. This software will do just that. Since the software is extremely portable, there are versions running on almost any operating system that has a C compiler. They have exactly the same look and feel across all platforms. 2 Introduction ANA is an utility program to assist users (especailly software developer) who are interested in ANAlyzing the binary contents of any file. This program may be easier for users who know C since the terminology used here is C like. Its major function is primarily to display the hexadecimal contents of any file interactively. On top of it, there are a lot of features built in to make this utility more flexible and useful. Some of these features include: able to dump the display buffer into a file, set the display length and base, pack the display and search for combination of bytes (search has not yet been built). An unique feature of this utility is perhaps its ability to analyze certain structure. This feature is especially geared for software developers. Sometimes data files are an array of records, each record contains information of different types. For example, the data file maybe a control file of a print queue. There are a number of records in there to represent the number of files waiting to be printed. Each record in turns contains different fields, these fields may indicate the file name to be printed, its priority and so on. They may have data type of character (1 byte), integer (2 or 4 bytes) and ASCII string. Using ANA, user can create an ASCII file in which the structure is defined. ANA then maps the data file into the structure and intreprets them as a series of fields instead of a string of bytes. 3 How to invoke ANA ANA can be invoked in any one of the following ways: 1) ANA 2) ANA 3) ANA 4 Inputs Inputs can be of form hexidecimal, decimal or ascii. Numerical inputs are interpreted according to the default base, however they can be overridden by a prefix. Any input prefixed by 0x are always hex no matter what the current default state is and any input prefixed with \ are always decimal. Single ascii character must be between single quotes, ascii string, however, must be between a pair of delimitor which can be any characters. e.g. command s strings is the same as command s 'tring', they both search for "tring". 5 Report Unpacked - 0x00000000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F |................| 0x00000010: 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F |................| Packed - 000102030405060708090A0B0C0D0E0F 101112131415161718191A1B1C1D1E1F The report format is fairly flexible. Report address, report data can be in either hexadecimal or decimal. The above format can vary depends on a number of parameters, these parameters can be set by various commands. However, the following are the default parameters unless otherwise overridden by their corresponding commands. Parameters Defaults Commands Pack Mode Unpacked p (packed) Address Base Hexadecimal b a (base) Data Base Hexadecimal b d Buffer size 240 bytes l (length) Report width 16 bytes w (width) Start address 0 a (address) 6 Command Descriptions 6.1 ? - Help Display a brief description of commands available. This is useful for commands review. 6.2 ENTER - Display next buffer Data is read from file into the data buffer and displayed. Then the next starting address is updated so that the next ENTER will display the following data. 6.3 l - Set new buffer length Define the size of the data buffer on the next display. 6.4 a - Set new starting address Define a new starting address of the data file rather than the continuation of the last display buffer. 6.5.1 b a - Toggle report address base In the unpack mode, the address of the first byte of each report line is shown. This address can be of base hexadecimal or decimal. This command toggle the base. 6.5.2 b d - Toggle report data base Data reported can be of base hexadecimal or decimal. This command toggle the base. 6.5.3 b i - Toggle input base All numerical inputs are interpreted on the current default base, this command toggle the base. However, inputs prefixed with 0x are always interpreted as hex and inputs prefixed with \ are always decimal. 6.6 c - Calculator functions Sometimes it is necessary to do some arithmatic operations on the data displayed. A simple set of arithmatic functions are available in ANA. Currently, the calculator can only do integer arithmatic and is limited to two operands and one operator (with only one exception for conversion). The syntax of this command is the command keyword followed by the operation followed by an ENTER. The following are examples and descriptions of all available operations. Suppose X and Y are two integers. c X * Y ( X multiply Y ) c X / Y ( X divided by Y ) c X + Y ( X plus Y ) c X - Y ( X minus Y ) c X % Y ( reminder of X divided by Y ) c X & Y ( X bit and with Y ) c X | Y ( X bit or with Y ) c X ^ Y ( X bit xor with Y ) c X > Y ( X right shift Y bits ) c X < Y ( X left shift Y bits ) c X ( X can be hex, decimal or ASCII ) 6.7 d - Download structure description file Each structure description file maps only one structure, sometimes it is desirable to map data to a different structure. This command loads another descritpion file for the next mapping. 6.8 D - continuously dump The whole work file starting at current location will be displayed continuously until the end of the file. 6.9 i - information desk This command display valuable information. Information includes the data types in bytes of current machine, current work file name, number, size, maximum work file allowed to open, number of work file currently opened and the user input base, report data base and report address base. Also the mapping alignments (read m command). 6.10 m - Map data to structure Maps the data in the data buffer just displayed into the structure described by the SDF. Mapping currently starts at the beginning of the data buffer, therefore user may have to adjust the starting address before the mapping. Data type will normally be aligned in a structure. For example, a 'short' after a 'char' will be put in even boundary and the byte after the 'char' is meaningless. This utility will allow user to specify its alignment boundary. The arguments are i for int, l for long, f for float and d for double. Their defaults values are displayed in information desk ('i'). 6.11 o - open another work file More than one file can be worked on, this command open another work file. 6.12 p - Toggle packed display mode As discussed above, report format can be either packed or unpacked, this command toggle this format. 6.13 q - Quit analyzer Terminate and exit program. 6.14 s - search for a pattern a pattern is searched starting at current location. The pattern can be a series of hex or decimal number, or an ascii string in a pair of delimiters. 6.15 t - Transfer data buffer to disk It is possible to store the buffer just displayed into a disk file, using this command will do just that. At the first execution of this command, the user will be prompted for the disk file name unless it is entered with the command. Any subsequent transfer will be appended to the named file and any file name entered in the command line will be ignored. 6.16 u - use a different work file If there are multiple work files opened (read o command), this command is used to switch to a different work file. 6.17 V - Display current version This command displays the current version of the software and copyright message. 6.18 w - Set display row width Especially after changing to packed display format from unpacked format, usually it is desirable to display more data in one line. This command allows user to adjust the display width. 6.19 z - zap old data with new data Be care when using this command, it will replace the old data at current location with the new data. There is no recovery from it. Data like 's' command can be hex, decimal or ascii. 6.20 ! - OS escape Run a regular Operating System command. 6.21 0 - Redisplay buffer Sometimes data may be scrolled off the screen, this command will redisplay data that was just displayed. 6.22 + - report the next display buffer 6.23 - - report the previous display buffer. 7 Structure mapping Mapping structure is one of the unique feature in this software. Rather than just dumping the data file in bytes, user can define a structure definition file (hereon called SDF) from which the data can be intrepreted in a more flexible way. The SDF is a pure ASCII file in which each line represents one data field and the whole file together defines a structure to be mapped. The way to use this feature is of the following steps: First, the SDF is created through any editor, this file must be named "ana.fmt". Second, display the beginning of the data structure by change the start address and hit ENTER. Finally, activate the mapping command to map the data buffer. Each line in the SDF represents one data type field, every line has the following format: keyword user_defined_id [length/byte_arrangement] All types except "string" the third optional field is for byte rearrange- ment. In case of data type string, a length field has to be specified to indicate how many bytes are in the string. The user defined name is used to assist user to identify the field, its content is arbitrary and is limited to 20 characters. Name more than 20 characters will be truncated. The keywords currently supported are: int signed interger char single character (byte) string string of characters long long signed integer short short signed integer ulong unsigned long integer ushort unsigned short integer uint unsigned integer float floating point number double double floating point number To allow more flexibility, it is also possible to interactively download a new SDF so that more than one structure can be analyzed in a data file. EXAMPLE: Suppose there is a file of employee records, each record starts with an employee name of 10 characters, then an employee number of type long, followed by his salary which is of type integer. Let's further assume that integer is two bytes and a long integer is four bytes. Instead of just dumping the data file in bytes, it is more useful to dump them in a more descriptive form, in this case a string, a long integer and an integer. SDF should look like this: string employee_name 10 long employee_no. int employee_salary When the ANA utility is executed, ANA detects the existance of the SDF, it will then build the internal structure and enable the mapping facility. User can display the portion of data that needs to be analyzed, and then activates the mapping command, the display may look something like: employee_name = (John Doe ) employee_no. = 999999 (0xF423F) employee_salary = 5000 (0x1388) Notice that the string is shown inside a pair of paranthesis. The integers are shown in decimal and have their corresponding hexadecimal values. User should be aware that computer always put data into their corresponding type boundary. For example, for a machine that uses a two byte integer, a structure like structure { character A; integer 1; } may be stored as follow: 41 XX 00 01 or 41 00 01 In the first case, the character falls in the even boundary, since the integer has to start also at the even boundary, there is a garbage byte in between which does not mean anything. While in the second case, the character falls in the odd boundary and therefore the integer can be put right after the character. For the same reason, sometimes it is very confusing to just look at the data byte by byte. It is better off to use the structure mapping. Furthermore, different CPUs have different characteristics. Some align integer and long into even boundary while other might align integer in even boundary and long in 4 byte boundary. This software will allow user to customize the alignment. Command m [i/l/f/d] specify the alignment of the data type used in structure mapping. To make life harder, some CPU swap bytes (our beloved 80x86 architecture) and some don't. The third optional field in the format file are just for that. It specifies the data rearrangement sequence. For example, an integer made of 4 bytes is stored in a file as 0x11 0x22 0x33 0x44 (0x011223344), a line in the format file: int sample yields an output of sample = 287454020 (0x11223344) but if the format file is written as: int sample 4321 the output will be sample = 1144201745 (0x44332211) This feature is really useful if for example, someone tries to dump a file created in a 68000 machine in a 8086 machine.