Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!cwjcc!hal!nic.MR.NET!tank!uxc!deimos!ksuvax1!ncrwic!ncrlnk!uunet!mcvax!hp4nl!botter!star.cs.vu.nl!ast From: ast@cs.vu.nl (Andy Tanenbaum) Newsgroups: comp.os.minix Subject: EM Manual (Part 1 of 2) Keywords: EM, code generation, AAMP Message-ID: <1650@ast.cs.vu.nl> Date: 12 Nov 88 12:51:30 GMT References: <446@eecea.eece.ksu.edu> Reply-To: ast@cs.vu.nl (Andy Tanenbaum) Organization: VU Informatica, Amsterdam Lines: 1751 In article <446@eecea.eece.ksu.edu> hardin@eecea.UUCP (David Hardin) writes: >Does anyone have a complete description of the EM used in Minix? I have now prepared a version of the EM manual for public consumption. This message and one following it will contain the description of the EM instructions, along with an interpreter in Pascal that lets you see what the instructions actually do. I can email the complete manual to people who are seriously interested, but it is quite large. Andy Tanenbaum (ast@cs.vu.nl) ------------------------- Extract from EM manual, part 1 of 2 -------------- 10. EM MACHINE LANGUAGE The EM machine language is designed to make program text compact and to make decoding easy. Compact program text has many advantages: programs exe- cute faster, programs occupy less primary and secondary storage and loading programs into satellite processors is faster. The decoding of EM machine language is so simple, that it is feasible to use interpreters as long as EM hardware machines are not available. This chapter is irrelevant when back ends are used to produce executable target machine code. 10.1 Instruction encoding A design goal of EM is to make the program text as compact as possible. Decoding must be easy, however. The encoding is fully byte oriented, without any small bit fields. There are 256 primary opcodes, two of which are an escape to two groups of 256 secondary opcodes each. EM instructions without arguments have a single opcode assigned, possibly escaped: |--------------| | opcode | |--------------| or |--------------|--------------| | escape | opcode | |--------------|--------------| The encoding for instructions with an argument is more complex. Several in- structions have an address from the global data area as argument. Other in- structions have different opcodes for positive and negative arguments. There is always an opcode that takes the next two bytes as argument, high byte first: |--------------|--------------|--------------| | opcode | hibyte | lobyte | |--------------|--------------|--------------| or |--------------|--------------|--------------|--------------| | escape | opcode | hibyte | lobyte | |--------------|--------------|--------------|--------------| An extra escape is provided for instructions with four or eight byte argu- ments. 29 |--------------|--------------|--------------| |--------------| | ESCAPE | opcode | hibyte |...| lobyte | |--------------|--------------|--------------| |--------------| For most instructions some argument values predominate. The most frequent combinations of instruction and argument will be encoded in a single byte, called a mini: |---------------| |opcode+argument| (mini) |---------------| The number of minis is restricted, because only 254 primary opcodes are available. Many instructions have the bulk of their arguments fall in the range 0 to 255. Instructions that address global data have their arguments distributed over a wider range, but small values of the high byte are com- mon. For all these cases there is another encoding that combines the in- struction and the high byte of the argument into a single opcode. These op- codes are called shorties. Shorties may be escaped. |--------------|--------------| | opcode+high | lobyte | (shortie) |--------------|--------------| or |--------------|--------------|--------------| | escape | opcode+high | lobyte | |--------------|--------------|--------------| Escaped shorties are useless if the normal encoding has a primary opcode. Note that for some instruction-argument combinations several different en- codings are available. It is the task of the assembler to select the shor- test of these. The savings by these mini and shortie opcodes are consider- able, about 55%. Further improvements are possible: the arguments of many instructions are a multiple of the wordsize. Some do also not allow zero as an argument. If these arguments are divided by the wordsize and, when zero is not allowed, then decremented by 1, more of them can be encoded as shortie or mini. The arguments of some other instructions rarely or never assume the value 0, but start at 1. The value 1 is then encoded as 0, 2 as 1 and so on. Assigning opcodes to instructions by the assembler is completely table driven. For details see appendix B. 30 10.2 Procedure descriptors The procedure identifiers used in the interpreter are indices into a table of procedure descriptors. Each descriptor contains: 1. the number of bytes to be reserved for locals at each invoca- tion. This is a pointer-szied integer. 2. the start address of the procedure 10.3 Load format The EM machine language load format defines the interface between the EM assembler/loader and the EM machine itself. A load file consists of a header, the program text to be executed, a description of the global data area and the procedure descriptor table, in this order. All integers in the load file are presented with the least significant byte first. The header has two parts: the first half (eight 16-bit integers) aids in selecting the correct EM machine or interpreter. Some EM machines, for in- stance, may have hardware floating point instructions. The header entries are as follows (bit 0 is rightmost): 1: magic number (07255) 2: flag bits with the following meaning: bit 0 TEST; test for integer overflow etc. bit 1 PROFILE; for each source line: count the number of memory cycles executed. bit 2 FLOW; for each source line: set a bit in a bit map table if in- structions on that line are executed. bit 3 COUNT; for each source line: increment a counter if that line is entered. bit 4 REALS; set if a program uses floating point instructions. bit 5 EXTRA; more tests during compiler debugging. 3: number of unresolved references. 4: version number; used to detect obsolete EM load files. 5: wordsize ; the number of bytes in each machine word. 6: pointer size ; the number of bytes available for addressing. 7: unused 8: unused The second part of the header (eight entries, of pointer size bytes each) describes the load file itself: 1: NTEXT; the program text size in bytes. 2: NDATA; the number of load-file descriptors (see below). 3: NPROC; the number of entries in the procedure descriptor table. 4: ENTRY; procedure number of the procedure to start with. 5: NLINE; the maximum source line number. 6: SZDATA; the address of the lowest uninitialized data byte. 7: unused 8: unused The program text consists of NTEXT bytes. NTEXT is always a multiple of the wordsize. The first byte of the program text is the first byte of the instruction address space, i.e. it has address 0. Pointers into the program text are found in the procedure descriptor table where relocation is simple and in the global data area. The initialization of the global data area al- lows easy relocation of pointers into both address spaces. 31 The global data area is described by the NDATA descriptors. Each descrip- tor describes a number of consecutive words (of wordsize) and consists of a sequence of bytes. While reading the descriptors from the load file, one can initialize the global data area from low to high addresses. The size of the initialized data area is given by SZDATA, this number can be used to check the initialization. The header of each descriptor consists of a byte, describing the type, and a count. The number of bytes used for this (unsigned) count depends on the type of the descriptor and is either a pointer-sized integer or one byte. The meaning of the count depends on the descriptor type. At load time an interpreter can perform any conversion deemed necessary, such as reordering bytes in integers and pointers and adding base addresses to pointers. 32 In the following pictures we show a graphical notation of the initializ- ers. The leftmost rectangle represents the leading byte. Fields marked with n contain a pointer-sized integer used as a count m contain a one-byte integer used as a count b contain a one-byte integer w contain a wordsized integer p contain a data or instruction pointer s contain a null terminated ASCII string ------------------- | 0 | n | repeat last initialization n times ------------------- --------- | 1 | m | m uninitialized words --------- ____________ / bytes \ ----------------- ----- | 2 | m | b | b |...| b | m initialized bytes ----------------- ----- _________ / word \ ----------------------- | 3 | m | w |... m initialized wordsized integers ----------------------- _________ / pointer \ ----------------------- | 4 | m | p |... m initialized data pointers ----------------------- _________ / pointer \ ----------------------- | 5 | m | p |... m initialized instruction pointers ----------------------- ____________ / bytes \ ------------------------- | 6 | m | b | b |...| b | initialized integer of size m ------------------------- 33 ____________ / bytes \ ------------------------- | 7 | m | b | b |...| b | initialized unsigned of size m ------------------------- ____________ / string \ ------------------------- | 8 | m | s | initialized float of size m ------------------------- type 0: If the last initialization initialized k bytes starting at address a, do the same initialization again n times, starting at a+k, a+2*k, .... a+n*k. This is the only descriptor whose starting byte is fol- lowed by an integer with the size of a pointer, in all other descrip- tors the first byte is followed by a one-byte count. This descriptor must be preceded by a descriptor of another type. type 1: Reserve m words, not explicitly initialized (BSS and HOL). type 2: The m bytes following the descriptor header are initializers for the next m bytes of the global data area. m is divisible by the word- size. type 3: The m words following the header are initializers for the next m words of the global data area. type 4: The m data address space pointers following the header are initializ- ers for the next m data pointers in the global data area. Inter- preters that represent EM pointers by target machine addresses must relocate all data pointers. type 5: The m instruction address space pointers following the header are in- itializers for the next m instruction pointers in the global data area. Interpreters that represent EM instruction pointers by target machine addresses must relocate these pointers. type 6: The m bytes following the header form a signed integer number with a size of m bytes, which is an initializer for the next m bytes of the global data area. m is governed by the same restrictions as for transfer of objects to/from memory. type 7: The m bytes following the header form an unsigned integer number with a size of m bytes, which is an initializer for the next m bytes of 34 the global data area. m is governed by the same restrictions as for transfer of objects to/from memory. type 8: The header is followed by an ASCII string, null terminated, to ini- tialize, in global data, a floating point number with a size of m bytes. m is governed by the same restrictions as for transfer of ob- jects to/from memory. The ASCII string contains the notation of a real as used in the Pascal language. The NPROC procedure descriptors on the load file consist of an instruction space address (of pointer size) and an integer (of pointer size) specifying the number of bytes for locals. 35 11. EM ASSEMBLY LANGUAGE We use two representations for assembly language programs, one is in ASCII and the other is the compact assembly language. The latter needs less space than the first for the same program and therefore allows faster processing. Our only program accepting ASCII assembly language converts it to the com- pact form. All other programs expect compact assembly input. The first part of the chapter describes the ASCII assembly language and its semantics. The second part describes the syntax of the compact assembly language. The last part lists the EM instructions with the type of arguments allowed and an indication of the function. Appendix A gives a detailed description of the effect of all instructions in the form of a Pascal program. 11.1 ASCII assembly language An assembly language program consists of a series of lines, each line may be blank, contain one (pseudo)instruction or contain one label. Input to the assembler is in lower case. Upper case is used in this document merely to distinguish keywords from the surrounding prose. Comment is allowed at the end of each line and starts with a semicolon ";". This kind of comment does not exist in the compact form. Labels must be placed all by themselves on a line and start in column 1. There are two kinds of labels, instruction and data labels. Instruction la- bels are unsigned positive integers. The scope of an instruction label is its procedure. The pseudoinstructions CON, ROM and BSS may be preceded by a line contain- ing a 1-8 character data label, the first character of which is a letter, period or underscore. The period may only be followed by digits, the others may be followed by letters, digits and underscores. The use of the charac- ter "." followed by a constant, which must be in the range 1 to 32767 (e.g. ".40") is recommended for compiler generated programs. These labels are considered as a special case and handled more efficiently in compact assem- bly language (see below). Note that a data label on its own or two consecu- tive labels are not allowed. Each statement may contain an instruction mnemonic or pseudoinstruction. These must begin in column 2 or later (not column 1) and must be followed by a space, tab, semicolon or LF. Everything on the line following a semicolon is taken as a comment. Each input file contains one module. A module may contain many pro- cedures, which may be nested. A procedure consists of a PRO statement, a (possibly empty) collection of instructions and pseudoinstructions and fi- nally an END statement. Pseudoinstructions are also allowed between pro- cedures. They do not belong to a specific procedure. All constants in EM are interpreted in the decimal base. The ASCII assem- bly language accepts constant expressions wherever constants are allowed. The operators recognized are: +, -, *, % and / with the usual precedence order. Use of the parentheses ( and ) to alter the precedence order is al- lowed. 11.1.1 Instruction arguments Unlike many other assembly languages, the EM assembly language requires all arguments of normal and pseudoinstructions to be either a constant or an identifier, but not a combination of these two. There is one exception to 36 this rule: when a data label is used for initialization or as an instruction argument, expressions of the form 'label+constant' and 'label-constant' are allowed. This makes it possible to address, for example, the third word of a ten word BSS block directly. Thus LOE LABEL+4 is permitted and so is CON LABEL+3. The resulting address is must be in the same fragment as the la- bel. It is not allowed to add or subtract from instruction labels or pro- cedure identifiers, which certainly is not a severe restriction and greatly aids optimization. Instruction arguments can be constants, data labels, data labels offsetted by a constant, instruction labels and procedure identifiers. The range of integers allowed depends on the instruction. Most instructions allow only integers (signed or unsigned) that fit in a word. Arguments used as offsets to pointers should fit in a pointer-sized integer. Finally, arguments to LDC should fit in a double-word integer. Several instructions have two possible forms: with an explicit argument and with an implicit argument on top of the stack. The size of the implicit argument is the wordsize. The implicit argument is always popped before all other operands. For example: 'CMI 4' specifies that two four-byte signed integers on top of the stack are to be compared. 'CMI' without an argument expects a wordsized integer on top of the stack that specifies the size of the integers to be compared. Thus the following two sequences are equivalent: LDL -10 LDL -10 LDL -14 LDL -14 LOC 4 CMI 4 CMI ZEQ *1 ZEQ *1 Section 11.1.6 shows the arguments allowed for each instruction. 11.1.2 Pseudoinstruction arguments Pseudoinstruction arguments can be divided in two classes: Initializers and others. The following initializers are allowed: signed integer con- stants, unsigned integer constants, floating-point constants, strings, data labels, data labels offsetted by a constant, instruction labels and pro- cedure identifiers. Constant initializers in BSS, HOL, CON and ROM pseudoinstructions can be followed by a letter I, U or F. This indicator specifies the type of the initializer: Integer, Unsigned or Float. If no indicator is present I is assumed. The size of the initializer is the wordsize unless the indicator is followed by an integer specifying the initializer's size. This integer is governed by the same restrictions as for transfer of objects to/from memory. As in instruction arguments, initializers include expressions of the form: "LABEL+offset" and "LABEL-offset". The offset must be an unsigned decimal constant. The 'IUF' indicators cannot be used in the offsets. Data labels are referred to by their name. Strings are surrounded by double quotes ("). Semicolon's in string do not indicate the start of comment. In the ASCII representation the escape char- acter \ (backslash) alters the meaning of subsequent character(s). This feature allows inclusion of zeroes, graphic characters and the double quote in the string. The following escape sequences exist: 37 newline NL(LF) \n horizontal tab HT \t backspace BS \b carriage return CR \r form feed FF \f backslash \ \\ double quote " \" bit pattern ddd \ddd The escape \ddd consists of the backslash followed by 1, 2, or 3 octal di- gits specifing the value of the desired character. If the character follow- ing a backslash is not one of those specified, the backslash is ignored. Example: CON "hello\012\0". Each string element initializes a single byte. The ASCII character set is used to map characters onto values. Instruction labels are referred to as *1, *2, etc. in both branch in- structions and as initializers. The notation $procname means the identifier for the procedure with the specified name. This identifier has the size of a pointer. 11.1.3 Notation First, the notation used for the arguments, classes of instructions and pseudoinstructions. = integer constant (current range -2**31..2**31-1) = data label = or or + or - = integer constant, unsigned constant, floating-point constant = string constant (surrounded by double quotes), = instruction label '*' followed by an integer in the range 0..32767. = procedure number ('$' followed by a procedure name) = , , or . = or <...>* = zero or more of <...> <...>+ = one or more of <...> [...] = optional ... 11.1.4 Pseudoinstructions 11.1.4.1 Storage declaration Initialized global data is allocated by the pseudoinstruction CON, which needs at least one argument. Each argument is used to allocate and initial- ize a number of consequtive bytes in data memory. The number of bytes to be allocated and the alignment depend on the type of the argument. For each argument, an integral number of words, determined by the argument type, is allocated and initialized. The pseudoinstruction ROM is the same as CON, except that it guarantees that the initialized words will not change during the execution of the pro- gram. This information allows optimizers to do certain calculations such as array indexing and subrange checking at compile time instead of at run time. 38 The pseudoinstruction BSS allocates uninitialized global data or large blocks of data initialized by the same value. The first argument to this pseudo is the number of bytes required, which must be a multiple of the wordsize. The other arguments specify the value used for initialization and whether the initialization is only for convenience or a strict necessity. The pseudoinstruction HOL is similar to BSS in that it requests an (un)initialized global data block. Addressing of a HOL block, however, is quasi absolute. The first byte is addressed by 0, the second byte by 1 etc. in assembly language. The assembler/loader adds the base address of the HOL block to these numbers to obtain the absolute address in the machine language. The scope of a HOL block starts at the HOL pseudo and ends at the next HOL pseudo or at the end of a module whatever comes first. Each instruction falls in the scope of at most one HOL block, the current HOL block. It is not allowed to have more than one HOL block per procedure. The alignment restrictions are enforced by the pseudoinstructions. All initializers are aligned on a multiple of their size or the wordsize which- ever is smaller. Strings form an exception, they are to be seen as a se- quence of initializers each for one byte, i.e. strings are not padded with zero bytes. Switching to another type of fragment or placing a label forces word-alignment. There are three types of fragments in global data space: CON, ROM and BSS/HOL. BSS ,, Reserve bytes. is the value used to initialize the area. must be a multiple of the size of . is 0 if the initialization is not strictly necessary, 1 if it is. HOL ,, Idem, but all following absolute global data references will refer to this block. Only one HOL is allowed per procedure, it has to be placed before the first instruction. CON + Assemble global data words initialized with the constants. ROM + Idem, but the initialized data will never be changed by the program. 11.1.4.2 Partitioning Two pseudoinstructions partition the input into procedures: PRO [,] Start of procedure. is the procedure name. is the number of bytes for locals. The number of bytes for locals must be specified in the PRO or END pseudoinstruction. When specified in both, they must be identical. END [] End of Procedure. is the number of bytes for locals. The number of bytes for locals must be specified in either the PRO or END pseudoinstruction or both. 39 11.1.4.3 Visibility Names of data and procedures in an EM module can either be internal or external. External names are known outside the module and are used to link several pieces of a program. Internal names are not known outside the modules they are used in. Other modules will not 'see' an internal name. To reduce the number of passes needed, it must be known at the first oc- currence whether a name is internal or external. If the first occurrence of a name is in a definition, the name is considered to be internal. If the first occurrence of a name is a reference, the name is considered to be external. If the first occurrence is in one of the following pseudoinstruc- tions, the effect of the pseudo has precedence. EXA External name. is known, possibly defined, outside this module. Note that may be defined in the same module. EXP External procedure identifier. Note that may be defined in the same module. INA Internal name. is internal to this module and must be defined in this module. INP Internal procedure. is internal to this module and must be de- fined in this module. 11.1.4.4 Miscellaneous Two other pseudoinstructions provide miscellaneous features: EXC , Two blocks of instructions preceding this one are interchanged before being processed. gives the number of lines of the first block. gives the number of lines of the second one. Blank and pure comment lines do not count. MES [,]* A special type of comment. Used by compilers to communicate with the optimizer, assembler, etc. as follows: MES 0 An error has occurred, stop further processing. MES 1 Suppress optimization. MES 2,, Use wordsize and pointer size . MES 3,,,, Indicates that a local variable is never referenced indirectly. Used to indicate that a register may be used for a specific vari- able. is offset in bytes from AB if positive and offset from LB if negative. gives the size of the variable. indicates the class of the variable. The following values are currently recognized: 0 The variable can be used for anything. 40 1 The variable is used as a loopindex. 2 The variable is used as a pointer. 3 The variable is used as a floating point number. gives the priority of the variable, higher numbers indicate better candidates. MES 4,, Number of source lines in file (for profiler). MES 5 Floating point used. MES 6,* Comment. Used to provide comments in compact assembly language. MES 7,..... Reserved. MES 8,[,]... Library module. Indicates that the module may only be loaded if it is useful, that is, if it can satisfy any unresolved references dur- ing the loading process. May not be preceded by any other pseudo, except MES's. MES 9, Guarantees that no more than bytes of parameters are accessed, either directly or indirectly. MES 10,[,]* This message number is reserved for the global optimizer. It in- serts these messages in its output as hints to backends. in- dicates the type of hint. MES 11 Procedures containing this message are possible destinations of non-local goto's with the GTO instruction. Some backends keep lo- cals in registers, the locals in this procedure should not be kept in registers and all registers containing locals of other procedures should be saved upon entry to this procedure. Each backend is free to skip irrelevant MES pseudos. 11.2 The Compact Assembly Language The assembler accepts input in a highly encoded form. This form is in- tended to reduce the amount of file transport between the front ends, optim- izers and back ends, and also reduces the amount of storage required for storing libraries. Libraries are stored as archived compact assembly language, not machine language. When beginning to read the input, the assembler is in neutral state, and expects either a label or an instruction (including the pseudoinstructions). The meaning of the next byte(s) when in neutral state is as follows, where b1, b2 etc. represent the succeeding bytes. 0 Reserved for future use 1-129 Machine instructions, see Appendix A, alphabetical list 130-149 Reserved for future use 150-161 BSS,CON,END,EXA,EXC,EXP,HOL,INA,INP,MES,PRO,ROM 162-179 Reserved for future pseudoinstructions 180-239 Instruction labels 0 - 59 (180 is local label 0 etc.) 240-244 See the Common Table below 245-255 Not used 41 After a label, the assembler is back in neutral state; it can immediately accept another label or an instruction in the next byte. No linefeeds are used to separate lines. If an opcode expects no arguments, the assembler is back in neutral state after reading the one byte containing the instruction number. If it has one or more arguments (only pseudos have more than 1), the arguments follow directly, encoded as follows: 0-239 Offsets from -120 to 119 240-255 See the Common Table below Absence of an optional argument is indicated by a special byte. Common Table for Neutral State and Arguments class bytes description 240 b1 Instruction label b1 (Not used for branches) 241 b1 b2 16 bit instruction label (256*b2 + b1) 242 b1 Global label .0-.255, with b1 being the label 243 b1 b2 Global label .0-.32767 with 256*b2+b1 being the label 244 Global symbol not of the form .nnn 245 b1 b2 16 bit constant 246 b1 b2 b3 b4 32 bit constant 247 b1 .. b8 64 bit constant 248 Global label + (possibly negative) constant 249 Procedure name (not including $) 250 String used in CON or ROM (no quotes-no escapes) 251 Integer constant, size bytes 252 Unsigned constant, size bytes 253 Floating constant, size bytes 254 unused 255 Delimiter for argument lists or indicates absence of optional argument The bytes specifying the value of a 16, 32 or 64 bit constant are present- ed in two's complement notation, with the least significant byte first. For example: the value of a 32 bit constant is ((s4*256+b3)*256+b2)*256+b1, where s4 is b4-256 if b4 is greater than 128 else s4 takes the value of b4. A consists of a inmediatly followed by a sequence of bytes with length . The pseudoinstructions fall into several categories, depending on their arguments: Group 1 - EXC, BSS, HOL have a known number of arguments Group 2 - EXA, EXP, INA, INP have a string as argument Group 3 - CON, MES, ROM have a variable number of various things Group 4 - END, PRO have a trailing optional argument. Groups 1 and 2 use the encoding described above. Group 3 also uses the en- coding listed above, with an byte after the last argument to indicate the end of the list. Group 4 uses an byte if the trailing argument is not present. 42 Example ASCII Example compact (LOC = 69, BRA = 18 here): 2 182 1 181 LOC 10 69 130 LOC -10 69 110 LOC 300 69 245 44 1 BRA *19 18 139 300 241 44 1 .3 242 3 CON 4,9,*2,$foo 151 124 129 240 2 249 123 102 111 111 255 CON .35 151 242 35 255 11.3 Assembly language instruction list For each instruction in the list the range of argument values in the as- sembly language is given. The column headed assem contains the mnemonics defined in 11.1.3. The following column specifies restrictions of the argu- ment value. Addresses have to obey the restrictions mentioned in chapter 2. The classes of arguments are indicated by letters: assem constraints rationale c cst fits word constant d cst fits double word constant l cst local offset g arg >= 0 global offset f cst fragment offset n cst >= 0 counter s cst >0 , word multiple object size z cst >= 0 , zero or word multiple object size o cst > 0 , word multiple or fraction object size w cst > 0 , word multiple object size * p pro pro identifier b ilb >= 0 label number r cst 0,1,2 register number - no argument The * at the rationale for w indicates that the argument can either be given as argument or on top of the stack. If the argument is omitted, the argument is fetched from the stack; it is assumed to be a wordsized unsigned integer. Instructions that check for undefined integer or floating-point values and underflow or overflow are indicated below by (*). 43 GROUP 1 - LOAD LOC c : Load constant (i.e. push one word onto the stack) LDC d : Load double constant ( push two words ) LOL l : Load word at l-th local (l<0) or parameter (l>=0) LOE g : Load external word g LIL l : Load word pointed to by l-th local or parameter LOF f : Load offsetted (top of stack + f yield address) LAL l : Load address of local or parameter LAE g : Load address of external LXL n : Load lexical (address of LB n static levels back) LXA n : Load lexical (address of AB n static levels back) LOI o : Load indirect o bytes (address is popped from the stack) LOS w : Load indirect, w-byte integer on top of stack gives object size LDL l : Load double local or parameter (two consecutive words are stacked) LDE g : Load double external (two consecutive externals are stacked) LDF f : Load double offsetted (top of stack + f yield address) LPI p : Load procedure identifier GROUP 2 - STORE STL l : Store local or parameter STE g : Store external SIL l : Store into word pointed to by l-th local or parameter STF f : Store offsetted STI o : Store indirect o bytes (pop address, then data) STS w : Store indirect, w-byte integer on top of stack gives object size SDL l : Store double local or parameter SDE g : Store double external SDF f : Store double offsetted GROUP 3 - INTEGER ARITHMETIC ADI w : Addition (*) SBI w : Subtraction (*) MLI w : Multiplication (*) DVI w : Division (*) RMI w : Remainder (*) NGI w : Negate (two's complement) (*) SLI w : Shift left (*) SRI w : Shift right (*) GROUP 4 - UNSIGNED ARITHMETIC ADU w : Addition SBU w : Subtraction MLU w : Multiplication DVU w : Division RMU w : Remainder SLU w : Shift left SRU w : Shift right 44 GROUP 5 - FLOATING POINT ARITHMETIC ADF w : Floating add (*) SBF w : Floating subtract (*) MLF w : Floating multiply (*) DVF w : Floating divide (*) NGF w : Floating negate (*) FIF w : Floating multiply and split integer and fraction part (*) FEF w : Split floating number in exponent and fraction part (*) GROUP 6 - POINTER ARITHMETIC ADP f : Add f to pointer on top of stack ADS w : Add w-byte value and pointer SBS w : Subtract pointers in same fragment and push diff as size w integer GROUP 7 - INCREMENT/DECREMENT/ZERO INC - : Increment word on top of stack by 1 (*) INL l : Increment local or parameter (*) INE g : Increment external (*) DEC - : Decrement word on top of stack by 1 (*) DEL l : Decrement local or parameter (*) DEE g : Decrement external (*) ZRL l : Zero local or parameter ZRE g : Zero external ZRF w : Load a floating zero of size w ZER w : Load w zero bytes GROUP 8 - CONVERT (stack:source, source size, dest. size (top)) CII - : Convert integer to integer (*) CUI - : Convert unsigned to integer (*) CFI - : Convert floating to integer (*) CIF - : Convert integer to floating (*) CUF - : Convert unsigned to floating (*) CFF - : Convert floating to floating (*) CIU - : Convert integer to unsigned CUU - : Convert unsigned to unsigned CFU - : Convert floating to unsigned GROUP 9 - LOGICAL AND w : Boolean and on two groups of w bytes IOR w : Boolean inclusive or on two groups of w bytes XOR w : Boolean exclusive or on two groups of w bytes COM w : Complement (one's complement of top w bytes) ROL w : Rotate left a group of w bytes ROR w : Rotate right a group of w bytes 45 GROUP 10 - SETS INN w : Bit test on w byte set (bit number on top of stack) SET w : Create singleton w byte set with bit n on (n is top of stack) GROUP 11 - ARRAY LAR w : Load array element, descriptor contains integers of size w SAR w : Store array element AAR w : Load address of array element GROUP 12 - COMPARE CMI w : Compare w byte integers, Push negative, zero, positive for <, = or > CMF w : Compare w byte reals CMU w : Compare w byte unsigneds CMS w : Compare w byte values, can only be used for bit for bit equality test CMP - : Compare pointers TLT - : True if less, i.e. iff top of stack < 0 TLE - : True if less or equal, i.e. iff top of stack <= 0 TEQ - : True if equal, i.e. iff top of stack = 0 TNE - : True if not equal, i.e. iff top of stack non zero TGE - : True if greater or equal, i.e. iff top of stack >= 0 TGT - : True if greater, i.e. iff top of stack > 0 GROUP 13 - BRANCH BRA b : Branch unconditionally to label b BLT b : Branch less (pop 2 words, branch if top > second) BLE b : Branch less or equal BEQ b : Branch equal BNE b : Branch not equal BGE b : Branch greater or equal BGT b : Branch greater ZLT b : Branch less than zero (pop 1 word, branch negative) ZLE b : Branch less or equal to zero ZEQ b : Branch equal zero ZNE b : Branch not zero ZGE b : Branch greater or equal zero ZGT b : Branch greater than zero GROUP 14 - PROCEDURE CALL CAI - : Call procedure (procedure identifier on stack) CAL p : Call procedure (with identifier p) LFR s : Load function result RET z : Return (function result consists of top z bytes) 46 GROUP 15 - MISCELLANEOUS ASP f : Adjust the stack pointer by f ASS w : Adjust the stack pointer by w-byte integer BLM z : Block move z bytes; first pop destination addr, then source addr BLS w : Block move, size is in w-byte integer on top of stack CSA w : Case jump; address of jump table at top of stack CSB w : Table lookup jump; address of jump table at top of stack DCH - : Follow dynamic chain, convert LB to LB of caller DUP s : Duplicate top s bytes DUS w : Duplicate top w bytes EXG w : Exchange top w bytes FIL g : File name (external 4 := g) GTO g : Non-local goto, descriptor at g LIM - : Load 16 bit ignore mask LIN n : Line number (external 0 := n) LNI - : Line number increment LOR r : Load register (0=LB, 1=SP, 2=HP) LPB - : Convert local base to argument base MON - : Monitor call NOP - : No operation RCK w : Range check; trap on error RTT - : Return from trap SIG - : Trap errors to proc identifier on top of stack, -2 resets default SIM - : Store 16 bit ignore mask STR r : Store register (0=LB, 1=SP, 2=HP) TRP - : Cause trap to occur (Error number on stack) 47 A. EM INTERPRETER { This is an interpreter for EM. It serves as the official machine definition. This interpreter must run on a machine which supports arithmetic with words and memory offsets. Certain aspects of the definition are over specified. In particular: 1. The representation of an address on the stack need not be the numerical value of the memory location. 2. The state of the stack is not defined after a trap has aborted an instruction in the middle. For example, it is officially un- defined whether the second operand of an ADD instruction has been popped or not if the first one is undefined ( -32768 or unsigned 32768). 3. The memory layout is implementation dependent. Only the most basic checks are performed whenever memory is accessed. 4. The representation of an integer or set on the stack is not fixed in bit order. 5. The format and existence of the procedure descriptors depends on the implementation. 6. The result of the compare operators CMI etc. are -1, 0 and 1 here, but other negative and positive values will do and they need not be the same each time. 7. The shift count for SHL, SHR, ROL and ROR must be in the range 0 to object size in bits - 1. The effect of a count not in this range is undefined. } 48 {$i256} {$d+} program em(tables,prog,input,output); label 8888,9999; const t15 = 32768; { 2**15 } t15m1 = 32767; { 2**15 -1 } t16 = 65536; { 2**16 } t16m1 = 65535; { 2**16 -1 } t31m1 = 2147483647; { 2**31 -1 } wsize = 2; { number of bytes in a word } asize = 2; { number of bytes in an address } fsize = 4; { number of bytes in a floating point number } maxret =4; { number of words in the return value area } signbit = t15; { the power of two indicating the sign bit } negoff = t16; { the next power of two } maxsint = t15m1; { the maximum signed integer } maxuint = t16m1; { the maximum unsigned integer } maxdbl = t31m1; { the maximum double signed integer } maxadr = t16m1; { the maximum address } maxoffs = t15m1; { the maximum offset from an address } maxbitnr= 15; { the number of the highest bit } lineadr = 0; { address of the line number } fileadr = 4; { address of the file name } maxcode = 8191; { highest byte in code address space } maxdata = 8191; { highest byte in data address space } { format of status save area } statd = 4; { how far is static link from lb } dynd = 2; { how far is dynamic link from lb } reta = 0; { how far is the return address from lb } savsize = 4; { size of save area in bytes } { procedure descriptor format } pdlocs = 0; { offset for size of local variables in bytes } pdbase = asize; { offset for the procedure base } pdsize = 4; { size of procedure descriptor in bytes = 2*asize } { header words } NTEXT = 1; NDATA = 2; NPROC = 3; ENTRY = 4; NLINE = 5; SZDATA = 6; escape1 = 254; { escape to secondary opcodes } escape2 = 255; { escape to tertiary opcodes } undef = signbit; { the range of integers is -32767 to +32767 } { error codes } EARRAY = 0; ERANGE = 1; ESET = 2; EIOVFL = 3; EFOVFL = 4; EFUNFL = 5; EIDIVZ = 6; EFDIVZ = 7; EIUND = 8; EFUND = 9; ECONV = 10; ESTACK = 16; EHEAP = 17; EILLINS = 18; EODDZ = 19; ECASE = 20; EMEMFLT = 21; EBADPTR = 22; EBADPC = 23; EBADLAE = 24; 49 EBADMON = 25; EBADLIN = 26; EBADGTO = 27; 50 {---------------------------------------------------------------------------} { Declarations } {---------------------------------------------------------------------------} type bitval= 0..1; { one bit } bitnr= 0..maxbitnr; { bits in machine words are numbered 0 to 15 } byte= 0..255; { memory is an array of bytes } adr= {0..maxadr} long; { the range of addresses } word= {0..maxuint} long;{ the range of unsigned integers } offs= -maxoffs..maxoffs; { the range of signed offsets from addresses } size= 0..maxoffs; { the range of sizes is the positive offsets } sword= {-signbit..maxsint} long; { the range of signed integers } full= {-maxuint..maxuint} long; { intermediate results need this range } double={-maxdbl..maxdbl} long; { double precision range } bftype= (andf,iorf,xorf); { tells which boolean operator needed } insclass=(prim,second,tert); { tells which opcode table is in use } instype=(implic,explic); { does opcode have implicit or explicit operand } iflags= (mini,short,sbit,wbit,zbit,ibit); ifset= set of iflags; mnem = ( NON, AAR, ADF, ADI, ADP, ADS, ADU,XAND, ASP, ASS, BEQ, BGE, BGT, BLE, BLM, BLS, BLT, BNE, BRA, CAI, CAL, CFF, CFI, CFU, CIF, CII, CIU, CMF, CMI, CMP, CMS, CMU, COM, CSA, CSB, CUF, CUI, CUU, DCH, DEC, DEE, DEL, DUP, DUS, DVF, DVI, DVU, EXG, FEF, FIF, FIL, GTO, INC, INE, INL, INN, IOR, LAE, LAL, LAR, LDC, LDE, LDF, LDL, LFR, LIL, LIM, LIN, LNI, LOC, LOE, LOF, LOI, LOL, LOR, LOS, LPB, LPI, LXA, LXL, MLF, MLI, MLU, MON, NGF, NGI, NOP, RCK, RET, RMI, RMU, ROL, ROR, RTT, SAR, SBF, SBI, SBS, SBU, SDE, SDF, SDL,XSET, SIG, SIL, SIM, SLI, SLU, SRI, SRU, STE, STF, STI, STL, STR, STS, TEQ, TGE, TGT, TLE, TLT, TNE, TRP, XOR, ZEQ, ZER, ZGE, ZGT, ZLE, ZLT, ZNE, ZRE, ZRF, ZRL); dispatch = record iflag: ifset; instr: mnem; case instype of implic: (implicit:sword); explic: (ilength:byte); end; var code: packed array[0..maxcode] of byte; { code space } data: packed array[0..maxdata] of byte; { data space } retarea: array[1..maxret ] of word; { return area } pc,lb,sp,hp,pd: adr; { internal machine registers } i: integer; { integer scratch variable } s,t :word; { scratch variables } sz:size; { scratch variables } ss,st: sword; { scratch variables } k :double; { scratch variables } j:size; { scratch variable used as index } a,b:adr; { scratch variable used for addresses } dt,ds:double; { scratch variables for double precision } 51 rt,rs,x,y:real; { scratch variables for real } found:boolean; { scratch } opcode: byte; { holds the opcode during execution } iclass: insclass; { true for escaped opcodes } dispat: array[insclass,byte] of dispatch; retsize:size; { holds size of last LFR } insr: mnem; { holds the instructionnumber } halted: boolean; { normally false } exitstatus:word; { parameter of MON 1 } ignmask:word; { ignore mask for traps } uerrorproc:adr; { number of user defined error procedure } intrap:boolean; { Set when executing trap(), to catch recursive calls} trapval:byte; { Set to number of last trap } header: array[1..8] of adr; tables: text; { description of EM instructions } prog: file of byte; { program and initialized data } {---------------------------------------------------------------------------} { Various check routines } {---------------------------------------------------------------------------} { Only the most basic checks are performed. These routines are inherently implementation dependent. } procedure trap(n:byte); forward; procedure memadr(a:adr); begin if (a>maxdata) or ((a=hp)) then trap(EMEMFLT) end; procedure wordadr(a:adr); begin memadr(a); if (a mod wsize<>0) then trap(EBADPTR) end; procedure chkadr(a:adr; s:size); begin memadr(a); memadr(a+s-1); { assumption: size is ok } if s0 then trap(EBADPTR) end else if a mod wsize<>0 then trap(EBADPTR) end; procedure newpc(a:double); begin if (a<0) or (a>maxcode) then trap(EBADPC); pc:=a end; procedure newsp(a:adr); begin if (a>lb) or (a0) then trap(ESTACK); sp:=a end; procedure newlb(a:adr); begin if (a0) then trap(ESTACK); lb:=a end; procedure newhp(a:adr); begin if (a>sp) or (a>maxdata+1) or (a mod wsize<>0) then trap(EHEAP) else hp:=a end; function argc(a:double):sword; begin if (a<-signbit) or (a>maxsint) then trap(EILLINS); argc:=a end; 52 function argd(a:double):double; begin if (a<-maxdbl) or (a>maxdbl) then trap(EILLINS); argd:=a end; function argl(a:double):offs; begin if (a<-maxoffs) or (a>maxoffs) then trap(EILLINS); argl:=a end; function argg(k:double):adr; begin if (k<0) or (k>maxadr) then trap(EILLINS); argg:=k end; function argf(a:double):offs; begin if (a<-maxoffs) or (a>maxoffs) then trap(EILLINS); argf:=a end; function argn(a:double):word; begin if (a<0) or (a>maxuint) then trap(EILLINS); argn:=a end; function args(a:double):size; begin if (a<=0) or (a>maxoffs) then trap(EODDZ) else if (a mod wsize)<>0 then trap(EODDZ); args:=a ; end; function argz(a:double):size; begin if (a<0) or (a>maxoffs) then trap(EODDZ) else if (a mod wsize)<>0 then trap(EODDZ); argz:=a ; end; function argo(a:double):size; begin if (a<=0) or (a>maxoffs) then trap(EODDZ) else if (a mod wsize<>0) and (wsize mod a<>0) then trap(EODDZ); argo:=a ; end; function argw(a:double):size; begin if (a<=0) or (a>maxoffs) or (a>maxuint) then trap(EODDZ) else if (a mod wsize)<>0 then trap(EODDZ); argw:=a ; end; function argp(a:double):size; begin if (a<0) or (a>=header[NPROC]) then trap(EILLINS); argp:=a end; function argr(a:double):word; begin if (a<0) or (a>2) then trap(EILLINS); argr:=a end; procedure argwf(s:double); begin if argw(s)<>fsize then trap(EILLINS) end; function szindex(s:double):integer; begin s:=argw(s); if (s mod wsize <> 0) or (s>2*wsize) then trap(EILLINS); szindex:=s div wsize end; function locadr(l:double):adr; begin l:=argl(l); if l<0 then locadr:=lb+l else locadr:=lb+l+savsize end;