Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!iuvax!cica!tut.cis.ohio-state.edu!pt.cs.cmu.edu!a.gp.cs.cmu.edu!koopman
From: koopman@a.gp.cs.cmu.edu (Philip Koopman)
Newsgroups: comp.lang.forth
Subject: Re: Floating point stack
Summary: my $.02
Message-ID: <10340@pt.cs.cmu.edu>
Date: 28 Aug 90 16:59:19 GMT
References: <wmb@MITCH.ENG.SUN.COM> <9008281431.AA00691@ucbvax.Berkeley.EDU>
Organization: Carnegie-Mellon University, CS/RI
Lines: 64

In article <9008281431.AA00691@ucbvax.Berkeley.EDU>, ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) writes:
>    Ok Phil, speak up.  We know you're out there.  Come out now and noone
> will get hurt.

This is a summary (to the extent that I can recall) of
the reasons for allowing using the data stack for
floating point data that I presented to the ANSI Forth 
meeting in Melbourne back in May.  That discussion appears
to have provided the impetus for the changes to the BASIS at the
latest meeting, but I have not been personally involved since May.

1) There is common practice for both using a separate floating
 point stack and a unified data/floating point stack.  Historically,
 separate floating point stacks have come into use because of 
 implementation considerations on specific platforms (e.g. the 80287).
 Coprocessor stacks can have problems (such as handling stack overflows 
 when reals are passed as subroutine parameters).
 On some platforms, a separate floating point stack is very expensive,
 because there is no on-chip register available for use as a pointer.
 The fact that there is common practice for both separated and unified
 stacks is what creates the issue.

2) I do not know of any stack-based machines (sometimes called "Forth
 machines") that support separated floating point stacks.  When
 last I checked, the consensus seemed to be that separate stacks
 are not likely to be added, either.  Certainly, a floating point
 stack can be emulated in memory, but it will be very slow compared
 to a *single-cycle* floating point operation that is likely to be
 found on 32-bit hardware.  Therefore, it is quite likely that users
 of such machines will have strong incentive to use a unified stack
 approach.  Harris floating point software assumes a unified stack.
 I predict that users of stack machines will ignore any requirement
 for using a separate floating point stack.
 A separate on-chip stack is quite expensive not only in silicon real
 estate, but also in terms of increased context switching time.

3) As Mitch has pointed out, in a great many cases code can be written
 to be insensitive to the stack model.  In those cases where such code
 is extremely inefficient, portable code could use conditional compilation
 to provide two versions.  My guess is that such code is very limited
 in size when viewed in the context of an entire application (and, if
 speed is that important, it's probably in assembler anyway).
 Also, much code is written with the loop variables in local variables
 or on the return stack (so, the sequence OVER 1+ OVER 1- for image
 processing could just as easily be I 1+ J 1-).

4) One motivator for separate stacks is that 16-bit integers are not
 the same size as 32-bit reals.  On 32-bit hardware, this problem
 goes away: single and double precision for reals and ints are the same size.
 (80-bit reals are brought to you courtesy of Intel, and are uncommon
  elsewhere).  If you are really serious about fast floating point
 (i.e., single-cycle F* and F+), you probably should be using a
 32-bit machine, so I do not weight this reason heavily.

I do not know whether a separate or unified stack is "best".  One of
my criteria will be which one a C compiler can use best for stack machines
(but, the jury is still out).  I requested that the standard not preclude
use of a unified stack.

  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet
  2525A Wexford Run Rd.
  Wexford, PA  15090
Senior scientist at Harris Semiconductor, and adjunct professor at CMU.
I don't speak for them, and they don't speak for me.