Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!zaphod.mps.ohio-state.edu!sdd.hp.com!ucsd!ucbvax!SCFVM.GSFC.NASA.GOV!ZMLEB
From: ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman)
Newsgroups: comp.lang.forth
Subject: Re: Floating point stack
Message-ID: <9008290355.AA19589@ucbvax.Berkeley.EDU>
Date: 29 Aug 90 03:45:25 GMT
References: <a.gp.cs.cmu.edu!koopman@PT.CS.CMU.EDU>
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 113


Phil Koopman <koopman@greyhound.ece.cmu.edu> writes:
>1) There is common practice for both using a separate floating
> point stack and a unified data/floating point stack.

   How common is the unified stack, I wonder?  Mitch mentioned that the
TC was pretty much settled on the separate stack idea, before settling
on allowing both.  That says to me that separate stacks are more common
than unified.

> On some platforms, a separate floating point stack is very expensive,
> because there is no on-chip register available for use as a pointer.

   On many platforms in common use (VAX, 68000), there is an on-chip
register available.  The lack of one on one current generation of Forth
chip should hamstring portability of floating point software on all?
I think I can safely say that the vast majority of current working FP
code is implemented on conventional CPUs.

>2) I do not know of any stack-based machines (sometimes called "Forth
> machines") that support separated floating point stacks.  When
> last I checked, the consensus seemed to be that separate stacks
> are not likely to be added, either.  Certainly, a floating point
> stack can be emulated in memory, but it will be very slow compared
> to a *single-cycle* floating point operation that is likely to be
> found on 32-bit hardware.

   Are there any stack-based machines with on-chip floating point capability;
floating point that executes basic instructions such as F+ and F* in a single
cycle?  If so, I can reluctantly accept your arguments (maybe).  If not,
and they are in the planning stage only, what is preventing the inclusion
of an FP stack register and associated stack memory?  Given the parallel
nature of stack-based machine architecture, I would think -- as a complete
novice when it comes to hardware design -- that it would be faster to perform
an FP addition and memory store operation in a single cycle, than to
perform an FP addition in one cycle, a swap in one cycle, and a store in
one cycle.

> Harris floating point software assumes a unified stack.

   To reiterate, is Harris floating point software founded on hardware FP
operations or software "emulation" of floating point.
   If the former, I take as my example the VAX, where FP is provided in the
CPU itself.  The additional stack manipulations required by unified stack far
outweigh the savings of a single register for holding the FP stack pointer.
Keeping FP values on a separate stack eliminates extraneous SWAPs, OVERs, etc.
A separate FP stack is a clear winner on this platform.
   If the latter, I take as my example the lowly 6502, for which I have
implemented a set of software floating point routines.  I found that amongst
all of the bit-manipulation gymnastics, accessing a memory variable to get
the FP stack pointer falls into the noise level when counting cycles.
   From my own experience, the separate stack becomes an advantage, or at
the very least, no disadvantage.  Coupled with the much greater ease in
writing code, I am a stone-cold-separate-floating-stack advocate.
   Perhaps I should stand before the TC to make my own "empassioned plea".
Nah, it's already too late.

>3) As Mitch has pointed out, in a great many cases code can be written
> to be insensitive to the stack model.

   Note the future tense used here.  In effect, the past has been discarded.
What I fear is that the effort to write stack-model-insensitive code will
be too great, resulting in "ANS Standard Floating Point" meaning nothing.
   I understand your arguments (I hope), but let me couple your predictions
of Forth-in-hardware developers ignoring floating point with a prediction
of my own:  I predict that *all* Forth developers will ignore the standard
and write their code for the system they use and disregard portability
considerations.  The end result will be that we will use the same words
(a major step forward in itself), but the words will mean different things.
   So much for Forth as a portable scientific application language.

>4) One motivator for separate stacks is that 16-bit integers are not
> the same size as 32-bit reals.  On 32-bit hardware, this problem
> goes away: single and double precision for reals and ints are the same size.
> (80-bit reals are brought to you courtesy of Intel, and are uncommon
>  elsewhere).  If you are really serious about fast floating point
> (i.e., single-cycle F* and F+), you probably should be using a
> 32-bit machine, so I do not weight this reason heavily.

    One motivator behind the ANS Standards effort was to release the "16-bit
barrier" in Forth-83.  This opens the door for vendors to develop systems
that are compatible across platforms with disparate word lengths.  The
problems with writing FP code that is portable on 32-bit platforms is made
nearly impossible on 16-bit platforms using 32-bit FP numbers when the FP
values are stored on the data stack.  This problem goes away completely when a
separate stack is used for FP values.
    Whether I am "really serious" about floating point calculations or not,
being able to write FP code on a VAX and run it unchanged on an Apple //
(which I can easily do, using a separate stack) is a powerful indication
of the portability afforded by this scheme.

>I do not know whether a separate or unified stack is "best".  One of
>my criteria will be which one a C compiler can use best for stack machines
>(but, the jury is still out).  I requested that the standard not preclude
>use of a unified stack.

    Oh, I thought we were talking about *Forth*.  Seriously, your point
about how a C compiler would implement FP operations is well taken.  Harris'
success could well be based on the efficiency of a C compiler on your chip.
But you admit that the jury is still out.  This is a direct conflict with
the common-practice argument.  Has Harris ever implemented a separate floating
point stack in order to make quantitative judgements?  Those judgements should
be made against other platforms performing the same operations, as opposed to
your own platform.  Could you still outperform the competition even with a
separate FP stack?  Perhaps Harris is so concerned with single-cycle
operations that multi-cycle operations are too much an anathema?

>  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet

-- Lee Brotzman (FIGI-L Moderator)
-- BITNET:   ZMLEB@SCFVM          Internet: zmleb@scfvm.gsfc.nasa.gov
-- "Between an idea and implementation, is software." -- Curse from Hubble
-- Space Telescope engineer.