Path: utzoo!attcan!uunet!ncrlnk!rd1632!otto
From: otto@rd1632.Dayton.NCR.COM (Jerome A. Otto)
Newsgroups: comp.arch
Subject: COBOL Decimal Arithmetic
Message-ID: <943@rd1632.Dayton.NCR.COM>
Date: 22 Sep 89 17:15:49 GMT
Reply-To: otto@rd1632.UUCP (Jerome A. Otto)
Organization: NCR Research & Development, Dayton, Ohio
Lines: 108

A comment on COBOL decimal support in hardware...

I haven't looked at this area in many years but I doubt if
that much has changed.

There are many ways to implement COBOL decimal arithmetic 
(both ASCII and packed decimal).  Common ways are:

(1)   Implement all cases in one or more runtime libraries.

(2)   Convert to binary (using hardware instructions if
      available), add binary, convert from binary (using 
      hardware if available)

(3)   Use hardware decimal instructions.

(4)   Generate inline code to use the "trick" to add decimal
      operands with a binary adder.  The "trick" is well 
      known but maybe not known or obvious to those reading 
      this newsgroup.

Real compilers might use more than one of the above methods
depending on if "special, high-occurring" cases are handled 
and the performance of decimal instructions on the target 
hardware.

(1) is very slow but might be reasonable since decimal 
arithmetic might not contribute much to overall application
performance.  The 10% decimal arithmetic in COBOL might not
seem like much but if poorly implemented, decimal arithmetic
might require 60% of the runtime cycles.

Very seldom, if ever, is (2) fastest due to the problem of 
converting from binary to decimal.  This conversion requires 
N divides or N+1 multiplies (N = number of digits).  In an 
optimizing compiler, a case might occur that a converted 
result could be used many times before a conversion back to 
decimal would be required.  In this case, (2) might be the 
fastest method.

Decimal instructions for COBOL have been implemented in many
machine architectures.  However, most implementations are 
done in microcode and often are not very fast.  Part of the 
problem is that the implementations try to implement general 
decimal arithmetic instructions that can handle 1-18 digits, 
scaling, different signs, etc.  The most common cases in 
COBOL are arithmetic with like items.-- these are known at 
compile time.  Things that are known at compile time are 
pushed off to runtime -- at the expense of performance.  The 
spirit of RISC would seem to indicate that operations should 
be broken down at compile time to its simplest components and 
only that code be generated.  Fortunately for COBOL, the most 
frequent cases that occur can be done with inline code using 
a binary adder using (4).

An example of (4) from a 68K analysis I did in 1984 is:

Unsigned ASCII characters can be added using a 32-bit binary adder 
by the following algorithm:

1. Add the constant 96969696 (base16) to one of the operands.  This 
"biases" the operand so that carries will propagate.  If one of the 
operands is a constant, then the constant will be pre-biased at 
compile time and this step can be omitted.

2. Add the other operand and the biased operand in binary.  Carries 
will propagate but the resulting digits require correction.  If a 
carry out did occur from a digit, the result requires the addition 
of 30 (base16).  If a carry out did not occur, then the carry 
propagate bits must be removed before the add of 30 (base16).

3.Correct the result by masking out the carry propagate bits and 
ORing 3030303030 (base16)

The operands are loaded, scaled, and padded depending on the 
result.  For example, the code for a 68K to add two ASCII digits to 
two ASCII digits giving a four digit result would be:

     MOVE.L         D6,D0          . ASCII zero to D0
     MOVE.W         op1(A5),D0     . Load first operand
     MOVE.L         D6,D0          . ASCII zero to D1
     MOVE.W         op2(A5),D1     . Load second operand
     ADD.L          #$96969696,D0  . Bias op1
     ADD.L          D1,D0          . Binary add
     MOVE.L         D0,D1          .
     AND.L          #$F0F0F0F0,D1  .
     SUB.L          D1,D0          .
     LSR.L          #3,D1          .
     AND.L          #$06060606,D1  .
     SUB.L          D1,D0          . Remove propagate bits
     OR,L           D6,D0          . Form ASCII digits
     MOVE.L         D0,op3(A5)     . Store results.

The algorithm for packed decimal operands is similar except that 
computing the correction for the inner carries is more complicated.  
Some architectures (NCR at least) implement a special instruction 
to correct the inner carries by using the saved nibble carries from 
a previous arithmetic instruction.  This one simple instruction 
makes the packed decimal algorithm simpler and faster.  

In general, using the compiler to determine special cases at 
compile time and generating the best code results in faster code 
than supporting a general decimal arithmetic instructions in 
hardware.  Not having decimal instructions is not a 7 to 1 
performance penalty at all!  Using the decimal instructions as 
implemented on many machines is a lose.

Aren't these just RISC ideas?