Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!sdd.hp.com!uakari.primate.wisc.edu!aplcen!haven!udel!rochester!pt.cs.cmu.edu!a.gp.cs.cmu.edu!koopman
From: koopman@a.gp.cs.cmu.edu (Philip Koopman)
Newsgroups: comp.lang.forth
Subject: Re: Floating point stack
Message-ID: <10343@pt.cs.cmu.edu>
Date: 29 Aug 90 14:43:02 GMT
References: <a.gp.cs.cmu.edu!koopman@PT.CS.CMU.EDU> <9008290355.AA19589@ucbvax.Berkeley.EDU>
Organization: Carnegie-Mellon University, CS/RI
Lines: 97

In article <9008290355.AA19589@ucbvax.Berkeley.EDU>, ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) writes:
>    How common is the unified stack, I wonder?
It depends on what you mean by common.  Since most Forth programmers
these days probably run on systems with 80287 hardware stacks (or
80287 emulator software), the answer
may be uncommon.  My personal experience has been almost uniformly
with unified stacks.  The fact that the ANS Forth folks passed the
resolution without my being at the meeting suggests that they now
believe it is common enough to warrant consideration.

>    On many platforms in common use (VAX, 68000), there is an on-chip
> register available.  The lack of one on one current generation of Forth
> chip should hamstring portability of floating point software on all?
> I think I can safely say that the vast majority of current working FP
> code is implemented on conventional CPUs.
Not all conventional CPUs have an available (or at least convenient)
register for the FP pointer.  The 80x86 could use BX or perhaps DI,
but requires an SS: override instruction for non-tiny memory models.
First-generation micros don't have a spare register (6502, 8080/Z80,
6800, 68HC11?).  You may think that small/old micros don't matter
much, but other folks (especially in embedded control) do.

>    Are there any stack-based machines with on-chip floating point capability;
> floating point that executes basic instructions such as F+ and F* in a single
> cycle? 
No announced products that I know of.  But, they will clearly come
some day.

> If so, I can reluctantly accept your arguments (maybe).  If not,
> and they are in the planning stage only, what is preventing the inclusion
> of an FP stack register and associated stack memory?
The memory takes a lot of chip area, which makes chips more expensive.
It also represents more context to be saved for context switches,
thus degrading real time performance.  Even a separate pointer register
is that much more context to save (especially since on a many systems
there will probably be limit registers as well).
The realities of the marketplace are that C performance will drive most
design decisions (not ANS Forth compliance).  Stack architectures will
have a separate FP stack on-chip only if they *significantly* help C
run times.  I believe that this will be true of companies besides
Harris in the future.  

>  Given the parallel
> nature of stack-based machine architecture, I would think -- as a complete
> novice when it comes to hardware design -- that it would be faster to perform
> an FP addition and memory store operation in a single cycle, than to
> perform an FP addition in one cycle, a swap in one cycle, and a store in
> one cycle.
Inexpensive memory is slower than floating point operations.  The major
limit to supercomputers these days is not fast floating point, but rather
memory bandwidth.  I'll trade stack twiddling for fetches and stores
any day.  Perhaps this is not optimal on current hardware, but it is
the wise long-term path.  As CPU speeds increase, the importance of reducing
demands on memory will increase in importance too.

>    To reiterate, is Harris floating point software founded on hardware FP
> operations or software "emulation" of floating point.
Both (at least in the planning stages).

>    If the latter, I take as my example the lowly 6502, for which I have
> implemented a set of software floating point routines.  I found that amongst
> all of the bit-manipulation gymnastics, accessing a memory variable to get
> the FP stack pointer falls into the noise level when counting cycles.
How about FDUP, FSWAP, etc.?  Here you are paying a proportionally larger
penalty for memory-based pointer manipulations.

>    From my own experience, the separate stack becomes an advantage, or at
> the very least, no disadvantage.  Coupled with the much greater ease in
> writing code, I am a stone-cold-separate-floating-stack advocate.
>    Perhaps I should stand before the TC to make my own "empassioned plea".
> Nah, it's already too late.
I didn't consider it an "empassioned plea" myself.  Just a statement of
fact.  Harris (and other stack machine vendors, to the best of my
knowledge) don't plan on supporting a separate hardware floating point
stack.  The other reasons I gave in my previous post were what I
perceive as the pro-unified stack point of view.  It is up to
the TC to sort out the facts and reach a wise decision.

>     Oh, I thought we were talking about *Forth*.  Seriously, your point
> about how a C compiler would implement FP operations is well taken.  Harris'
> success could well be based on the efficiency of a C compiler on your chip.
> But you admit that the jury is still out.  This is a direct conflict with
> the common-practice argument. 
No conflict at all.  The common practice argument should be restricted
to Forth.  The jury is out on C.  I have no performance numbers, and am
therefore unwilling to preclude future use of a unified stack (OR, a
split stack).  The issue is whether the TC wants to take into account
the very likely possibility (based on size, memory,  and context switching
considerations) that unified stacks will be significantly more
efficient on future stack machines.  We all know that many Forth
programmers value efficiency above conformance to a standard.

  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet
  2525A Wexford Run Rd.
  Wexford, PA  15090
Senior scientist at Harris Semiconductor, and adjunct professor at CMU.
I don't speak for them, and they don't speak for me.