Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!zaphod.mps.ohio-state.edu!julius.cs.uiuc.edu!apple!metaphor!cronos.metaphor.com!tam From: tam@cronos.metaphor.com Newsgroups: comp.unix.programmer Subject: ANA - a binary analyser (part 01/02) Message-ID: <1515@metaphor.Metaphor.COM> Date: 16 Oct 90 17:30:55 GMT Sender: news@metaphor.Metaphor.COM Reply-To: tam@cronos.metaphor.com () Organization: Metaphor Computer Systems, Mountain View, CA Lines: 392 For those people who don't think I should post PC binary in this group, please accept my apology. I think ANA is so useful for some developers that it is too selfish not to share (Of course I am still selfish of not sharing the source). I have received a lot of requests on ANA, including a lot of requests on platforms that I don't support. ANA is highly portable, the reason why I don't support these platforms are simply because I don't have access to those machines, as long as they have a C compiler. So far I don't need a single change in source on any platforms I compiled. I also get a lot of requests on the MSDOS version. Since I believe anywhere have a computer should have at least one PC, I am posting the binary on MSDOS. This not only will serve the purpose of sending to those people who request this version, but also will serve as a demo program. What you see in the MSDOS version is exactly the same on other platforms. That's why I keep the user interface so old fashion, I could have gone to full screen but then I will have problem in porting. For those people who find it absolutely essential but I don't yet support the platform, may be we can make some arrangement. For those requesters who have not received ANA, it's because either we have mail problem or I have not found a way to email back to them. If you have not received ANA yet, please email to me again. Mr. Tommy Wallo from Swedish Institute of Computer Science has set up a anonymous site for ANA, many thanks to him. He have all the versions for the Sun platform. The address is 192.16.123.90. You can also ask me to email to you. The following is a user guide of ANA: Current versions available: Sun 3 UNIX 4.2 Release 3.5 and SunOS 4.1 (680x0) SunOS 4.0.3c SunOS 4.1 HP 9000/300 HP/UX IBM RS/6000 AIX 3.1 IBM PS2 AIX 1.2 IBM RT AIX 2.? MSDOS OS/2 1.2 ANA Command summary Prepared by: Paul C. Tam For Version 0.15 Printed on 24 July 1990 I was rushing to finish this document, some parts may be confusing. I appreciate any comment or enhancement of this document. This version of ANA are free, please feel free to copy. 0 HIGHLIGHTS * Interpret binary data in structures YOU defined. So that I can dump my object file in OBJECT FILE FORMAT * Rearrange data bytes before interpretation. Remember our byte swap problem? * Report current machine data types. So I know my integer type is 4 bytes not 2 * Dump binary data in very flexible format. So it won't scroll faster than I can read * Dump multiple files on same screen. Don't you wish sometimes you can compare multiple files on same screen? * Same user interface across various platforms. I don't care what OS I'm running, ANA always treats me the same * Built in calculator/converter. I want to dump the 0x549th byte off the current position I want to know what 0x534634 is in decimal, what 0x42 is in ASCII * Save output to disk for future use. I want a hardcopy so that I can make a paper plane * Search for patterns. I want to find the VP's name in the salary file and replace with mine * Execute Operating System command without exiting. Most popular games have a fake to OS mode, but I think this is for real * And more...... 1 INTER-OPERABILITY Inter-operational seems to be a hot buzzword these days. This software will do just that. Since the software is extremely portable, there are versions running on almost any operating system that has a C compiler. They have exactly the same look and feel across all platforms. 2 Introduction ANA is an utility program to assist users (especailly software developer) who are interested in ANAlyzing the binary contents of any file. This program may be easier for users who know C since the terminology used here is C like. Its major function is primarily to display the binary contents of any file interactively. On top of it, there are a lot of features built in to make this utility more flexible and useful. Some of these features include: able to dump the display buffer into a file, set the display length and base, pack the display and search for combination of bytes. An unique feature of this utility is perhaps its ability to analyze certain structure. This feature is especially geared for software developers. Sometimes data files are an array of records, each record contains information of different types. For example, the data file maybe a control file of a print queue. There are a number of records in there to represent the number of files waiting to be printed. Each record in turns contains different fields, these fields may indicate the file name to be printed, its priority and so on. They may have data type of character (1 byte), integer (2 or 4 bytes) and ASCII string. Using ANA, user can create an ASCII file in which the structure is defined. ANA then maps the data file into the structure and intreprets them as a series of fields instead of a string of bytes. 3 How to invoke ANA ANA can be invoked in any one of the following ways: 1) ANA 2) ANA 3) ANA 4 Inputs Inputs can be of form hexidecimal, decimal or ascii. Numerical inputs are interpreted according to the default base, however they can be overridden by a prefix. Any input prefixed by 0x are always hex no matter what the current default state is and any input prefixed with \ are always decimal. Single ascii character must be between single quotes, ascii string, however, must be between a pair of delimitor which can be any characters. e.g. command s strings is the same as command s 'tring', they both search for "tring". 5 Report Unpacked - 0x00000000: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F |................| 0x00000010: 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F |................| Packed - 000102030405060708090A0B0C0D0E0F 101112131415161718191A1B1C1D1E1F The report format is fairly flexible. Report address, report data can be in either hexadecimal or decimal. The above format can vary depends on a number of parameters, these parameters can be set by various commands. However, the following are the default parameters unless otherwise overridden by their corresponding commands. Parameters Defaults Commands Pack Mode Unpacked p (packed) Address Base Hexadecimal b a (base) Data Base Hexadecimal b d Buffer size 240 bytes l (length) Report width 16 bytes w (width) Start address 0 a (address) 6 Command Descriptions 6.1 ? - Help Display a brief description of commands available. This is useful for commands review. 6.2 ENTER - Display next buffer Data is read from file into the data buffer and displayed. Then the next starting address is updated so that the next ENTER will display the following data. 6.3 l - Set new buffer length Define the size of the data buffer on the next display. 6.4 a - Set new starting address Define a new starting address of the data file rather than the continuation of the last display buffer. 6.5.1 b a - Toggle report address base In the unpack mode, the address of the first byte of each report line is shown. This address can be of base hexadecimal or decimal. This command toggles the base. 6.5.2 b d - Toggle report data base Data reported can be of base hexadecimal or decimal. This command toggles the base. 6.5.3 b i - Toggle input base All numerical inputs are interpreted on the current default base, this command toggles the base. However, inputs prefixed with 0x are always interpreted as hex and inputs prefixed with \ are always decimal. 6.6 c - Calculator functions Sometimes it is necessary to do some arithmatic operations on the data displayed. A simple set of arithmatic functions are available in ANA. Currently, the calculator can only do integer arithmatic and is limited to two operands and one operator (with only one exception on conversions). The syntax of this command is the command keyword followed by the operation followed by an ENTER. The following are examples and descriptions of all available operations. Suppose X and Y are two integers. c X * Y ( X multiply Y ) c X / Y ( X divided by Y ) c X + Y ( X plus Y ) c X - Y ( X minus Y ) c X % Y ( reminder of X divided by Y ) c X & Y ( X bit and with Y ) c X | Y ( X bit or with Y ) c X ^ Y ( X bit xor with Y ) c X > Y ( X right shift Y bits ) c X < Y ( X left shift Y bits ) c X ( X can be hex, decimal or ASCII ) 6.7 d - Download structure description file Each structure description file maps only one structure, sometimes it is desirable to map data to a different structure. This command loads another descritpion file for the next mapping. 6.8 D - continuously dump The whole work file starting at current location will be displayed continuously until the end of the file. 6.9 i - information desk This command display valuable information. Information includes the data types in bytes of current machine, current work file name, number, size, maximum work file allowed to open, number of work file currently opened and the user input base, report data base and report address base. Also the mapping alignments (read m command). 6.10 m - Map data to structure Maps the data in the data buffer just displayed into the structure described by the SDF. Mapping currently starts at the beginning of the data buffer, therefore user may have to adjust the starting address before the mapping. Data type will normally be aligned in a structure. For example, a 'short' after a 'char' will be put in even boundary and the byte after the 'char' is meaningless. This utility will allow user to specify its alignment boundary. The arguments are i for int, l for long, f for float and d for double. Their defaults values are displayed in information desk ('i'). 6.11 o - open another work file More than one file can be worked on, this command open another work file. 6.12 p - Toggle packed display mode As discussed above, report format can be either packed or unpacked, this command toggle this format. 6.13 q - Quit analyzer Terminate and exit program. 6.14 s - search for a pattern a pattern is searched starting at current location. The pattern can be a series of hex or decimal number, or an ascii string in a pair of delimiters. 6.15 t - Transfer data buffer to disk It is possible to store the buffer just displayed into a disk file, using this command will do just that. At the first execution of this command, the user will be prompted for the disk file name unless it is entered with the command. Any subsequent transfer will be appended to the named file and any file name entered in the command line will be ignored. 6.16 u - use a different work file If there are multiple work files opened (read o command), this command is used to switch to a different work file. 6.17 V - Display current version This command displays the current version of the software and copyright message. 6.18 w - Set display row width Especially after changing to packed display format from unpacked format, usually it is desirable to display more data in one line. This command allows user to adjust the display width. 6.19 z - zap old data with new data Be care when using this command, it will replace the old data at current location with the new data. There is no recovery from it. Data like 's' command can be hex, decimal or ascii. 6.20 ! - OS escape Run a regular Operating System command. 6.21 0 - Redisplay buffer Sometimes data may be scrolled off the screen, this command will redisplay data that was just displayed. 6.22 + - report the next display buffer 6.23 - - report the previous display buffer. 7 Structure mapping Mapping structure is one of the unique feature in this software. Rather than just dumping the data file in bytes, user can define a structure definition file (hereon called SDF) from which the data can be intrepreted in a more flexible way. The SDF is a pure ASCII file in which each line represents one data field and the whole file together defines a structure to be mapped. The way to use this feature is of the following steps: First, the SDF is created through any editor. If the SDF is named "ana.fmt" it will be read at program start otherwise mapping is disable until user loads another SDF. Second, display the beginning of the data structure by changing the start address and hit ENTER. Finally, activate the mapping command to map the data buffer. Each line in the SDF represents one data type field, every line has the following format: keyword user_defined_id [length/byte_arrangement] All types except "string" the third optional field is for byte rearrange- ment. In case of data type string, a length field has to be specified to indicate how many bytes are in the string. The user defined name is used to assist user to identify the field, its content is arbitrary and is limited to 20 characters. Name more than 20 characters will be truncated. The keywords currently supported are: int signed interger char single character (byte) string string of characters long long signed integer short short signed integer ulong unsigned long integer ushort unsigned short integer uint unsigned integer float floating point number double double floating point number To allow more flexibility, it is also possible to interactively download a new SDF so that more than one structure can be analyzed in a data file. EXAMPLE: Suppose there is a file of employee records, each record starts with an employee name of 10 characters, then an employee number of type long, followed by his salary which is of type integer. Let's further assume that integer is two bytes and a long integer is four bytes. Instead of just dumping the data file in bytes, it is more useful to dump them in a more descriptive form, in this case a string, a long integer and an integer. SDF should look like this: string employee_name 10 long employee_no. int employee_salary When the ANA utility is executed, ANA detects the existance of the SDF, it will then build the internal structure and enable the mapping facility. User can display the portion of data that needs to be analyzed, and then activates the mapping command, the display may look something like: employee_name = (John Doe ) employee_no. = 999999 (0xF423F) employee_salary = 5000 (0x1388) Notice that the string is shown inside a pair of paranthesis. The integers are shown in decimal and have their corresponding hexadecimal values. User should be aware that computer always put data into their corresponding type boundary. For example, for a machine that uses a two byte integer, a structure like structure { character A; integer 1; } may be stored as follow: 41 XX 00 01 or 41 00 01 In the first case, the character falls in the even boundary, since the integer has to start also at the even boundary, there is a garbage byte in between which does not mean anything. While in the second case, the character falls in the odd boundary and therefore the integer can be put right after the character. For the same reason, sometimes it is very confusing to just look at the data byte by byte. It is better off to use the structure mapping. Furthermore, different CPUs have different characteristics. Some align integer and long into even boundary while other might align integer in even boundary and long in 4 byte boundary. This software will allow user to customize the alignment. Command m [i/l/f/d] specify the alignment of the data type used in structure mapping. To make life harder, some CPU swap bytes (our beloved 80x86 architecture) and some don't. The third optional field in the format file are just for that. It specifies the data rearrangement sequence. For example, an integer made of 4 bytes is stored in a file as 0x11 0x22 0x33 0x44 (0x011223344), a line in the format file: int sample yields an output of sample = 287454020 (0x11223344) but if the format file is written as: int sample 4321 the output will be sample = 1144201745 (0x44332211) This feature is really useful if for example, someone tries to dump a file created in a 68000 machine in a 8086 machine.