Path: utzoo!attcan!uunet!ncrlnk!rd1632!otto From: otto@rd1632.Dayton.NCR.COM (Jerome A. Otto) Newsgroups: comp.arch Subject: COBOL Decimal Arithmetic Message-ID: <943@rd1632.Dayton.NCR.COM> Date: 22 Sep 89 17:15:49 GMT Reply-To: otto@rd1632.UUCP (Jerome A. Otto) Organization: NCR Research & Development, Dayton, Ohio Lines: 108 A comment on COBOL decimal support in hardware... I haven't looked at this area in many years but I doubt if that much has changed. There are many ways to implement COBOL decimal arithmetic (both ASCII and packed decimal). Common ways are: (1) Implement all cases in one or more runtime libraries. (2) Convert to binary (using hardware instructions if available), add binary, convert from binary (using hardware if available) (3) Use hardware decimal instructions. (4) Generate inline code to use the "trick" to add decimal operands with a binary adder. The "trick" is well known but maybe not known or obvious to those reading this newsgroup. Real compilers might use more than one of the above methods depending on if "special, high-occurring" cases are handled and the performance of decimal instructions on the target hardware. (1) is very slow but might be reasonable since decimal arithmetic might not contribute much to overall application performance. The 10% decimal arithmetic in COBOL might not seem like much but if poorly implemented, decimal arithmetic might require 60% of the runtime cycles. Very seldom, if ever, is (2) fastest due to the problem of converting from binary to decimal. This conversion requires N divides or N+1 multiplies (N = number of digits). In an optimizing compiler, a case might occur that a converted result could be used many times before a conversion back to decimal would be required. In this case, (2) might be the fastest method. Decimal instructions for COBOL have been implemented in many machine architectures. However, most implementations are done in microcode and often are not very fast. Part of the problem is that the implementations try to implement general decimal arithmetic instructions that can handle 1-18 digits, scaling, different signs, etc. The most common cases in COBOL are arithmetic with like items.-- these are known at compile time. Things that are known at compile time are pushed off to runtime -- at the expense of performance. The spirit of RISC would seem to indicate that operations should be broken down at compile time to its simplest components and only that code be generated. Fortunately for COBOL, the most frequent cases that occur can be done with inline code using a binary adder using (4). An example of (4) from a 68K analysis I did in 1984 is: Unsigned ASCII characters can be added using a 32-bit binary adder by the following algorithm: 1. Add the constant 96969696 (base16) to one of the operands. This "biases" the operand so that carries will propagate. If one of the operands is a constant, then the constant will be pre-biased at compile time and this step can be omitted. 2. Add the other operand and the biased operand in binary. Carries will propagate but the resulting digits require correction. If a carry out did occur from a digit, the result requires the addition of 30 (base16). If a carry out did not occur, then the carry propagate bits must be removed before the add of 30 (base16). 3.Correct the result by masking out the carry propagate bits and ORing 3030303030 (base16) The operands are loaded, scaled, and padded depending on the result. For example, the code for a 68K to add two ASCII digits to two ASCII digits giving a four digit result would be: MOVE.L D6,D0 . ASCII zero to D0 MOVE.W op1(A5),D0 . Load first operand MOVE.L D6,D0 . ASCII zero to D1 MOVE.W op2(A5),D1 . Load second operand ADD.L #$96969696,D0 . Bias op1 ADD.L D1,D0 . Binary add MOVE.L D0,D1 . AND.L #$F0F0F0F0,D1 . SUB.L D1,D0 . LSR.L #3,D1 . AND.L #$06060606,D1 . SUB.L D1,D0 . Remove propagate bits OR,L D6,D0 . Form ASCII digits MOVE.L D0,op3(A5) . Store results. The algorithm for packed decimal operands is similar except that computing the correction for the inner carries is more complicated. Some architectures (NCR at least) implement a special instruction to correct the inner carries by using the saved nibble carries from a previous arithmetic instruction. This one simple instruction makes the packed decimal algorithm simpler and faster. In general, using the compiler to determine special cases at compile time and generating the best code results in faster code than supporting a general decimal arithmetic instructions in hardware. Not having decimal instructions is not a 7 to 1 performance penalty at all! Using the decimal instructions as implemented on many machines is a lose. Aren't these just RISC ideas?