Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!wuarchive!zaphod.mps.ohio-state.edu!sdd.hp.com!ucsd!ucbvax!SCFVM.GSFC.NASA.GOV!ZMLEB From: ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) Newsgroups: comp.lang.forth Subject: Re: Floating point stack Message-ID: <9008290355.AA19589@ucbvax.Berkeley.EDU> Date: 29 Aug 90 03:45:25 GMT References: Sender: daemon@ucbvax.BERKELEY.EDU Organization: The Internet Lines: 113 Phil Koopman writes: >1) There is common practice for both using a separate floating > point stack and a unified data/floating point stack. How common is the unified stack, I wonder? Mitch mentioned that the TC was pretty much settled on the separate stack idea, before settling on allowing both. That says to me that separate stacks are more common than unified. > On some platforms, a separate floating point stack is very expensive, > because there is no on-chip register available for use as a pointer. On many platforms in common use (VAX, 68000), there is an on-chip register available. The lack of one on one current generation of Forth chip should hamstring portability of floating point software on all? I think I can safely say that the vast majority of current working FP code is implemented on conventional CPUs. >2) I do not know of any stack-based machines (sometimes called "Forth > machines") that support separated floating point stacks. When > last I checked, the consensus seemed to be that separate stacks > are not likely to be added, either. Certainly, a floating point > stack can be emulated in memory, but it will be very slow compared > to a *single-cycle* floating point operation that is likely to be > found on 32-bit hardware. Are there any stack-based machines with on-chip floating point capability; floating point that executes basic instructions such as F+ and F* in a single cycle? If so, I can reluctantly accept your arguments (maybe). If not, and they are in the planning stage only, what is preventing the inclusion of an FP stack register and associated stack memory? Given the parallel nature of stack-based machine architecture, I would think -- as a complete novice when it comes to hardware design -- that it would be faster to perform an FP addition and memory store operation in a single cycle, than to perform an FP addition in one cycle, a swap in one cycle, and a store in one cycle. > Harris floating point software assumes a unified stack. To reiterate, is Harris floating point software founded on hardware FP operations or software "emulation" of floating point. If the former, I take as my example the VAX, where FP is provided in the CPU itself. The additional stack manipulations required by unified stack far outweigh the savings of a single register for holding the FP stack pointer. Keeping FP values on a separate stack eliminates extraneous SWAPs, OVERs, etc. A separate FP stack is a clear winner on this platform. If the latter, I take as my example the lowly 6502, for which I have implemented a set of software floating point routines. I found that amongst all of the bit-manipulation gymnastics, accessing a memory variable to get the FP stack pointer falls into the noise level when counting cycles. From my own experience, the separate stack becomes an advantage, or at the very least, no disadvantage. Coupled with the much greater ease in writing code, I am a stone-cold-separate-floating-stack advocate. Perhaps I should stand before the TC to make my own "empassioned plea". Nah, it's already too late. >3) As Mitch has pointed out, in a great many cases code can be written > to be insensitive to the stack model. Note the future tense used here. In effect, the past has been discarded. What I fear is that the effort to write stack-model-insensitive code will be too great, resulting in "ANS Standard Floating Point" meaning nothing. I understand your arguments (I hope), but let me couple your predictions of Forth-in-hardware developers ignoring floating point with a prediction of my own: I predict that *all* Forth developers will ignore the standard and write their code for the system they use and disregard portability considerations. The end result will be that we will use the same words (a major step forward in itself), but the words will mean different things. So much for Forth as a portable scientific application language. >4) One motivator for separate stacks is that 16-bit integers are not > the same size as 32-bit reals. On 32-bit hardware, this problem > goes away: single and double precision for reals and ints are the same size. > (80-bit reals are brought to you courtesy of Intel, and are uncommon > elsewhere). If you are really serious about fast floating point > (i.e., single-cycle F* and F+), you probably should be using a > 32-bit machine, so I do not weight this reason heavily. One motivator behind the ANS Standards effort was to release the "16-bit barrier" in Forth-83. This opens the door for vendors to develop systems that are compatible across platforms with disparate word lengths. The problems with writing FP code that is portable on 32-bit platforms is made nearly impossible on 16-bit platforms using 32-bit FP numbers when the FP values are stored on the data stack. This problem goes away completely when a separate stack is used for FP values. Whether I am "really serious" about floating point calculations or not, being able to write FP code on a VAX and run it unchanged on an Apple // (which I can easily do, using a separate stack) is a powerful indication of the portability afforded by this scheme. >I do not know whether a separate or unified stack is "best". One of >my criteria will be which one a C compiler can use best for stack machines >(but, the jury is still out). I requested that the standard not preclude >use of a unified stack. Oh, I thought we were talking about *Forth*. Seriously, your point about how a C compiler would implement FP operations is well taken. Harris' success could well be based on the efficiency of a C compiler on your chip. But you admit that the jury is still out. This is a direct conflict with the common-practice argument. Has Harris ever implemented a separate floating point stack in order to make quantitative judgements? Those judgements should be made against other platforms performing the same operations, as opposed to your own platform. Could you still outperform the competition even with a separate FP stack? Perhaps Harris is so concerned with single-cycle operations that multi-cycle operations are too much an anathema? > Phil Koopman koopman@greyhound.ece.cmu.edu Arpanet -- Lee Brotzman (FIGI-L Moderator) -- BITNET: ZMLEB@SCFVM Internet: zmleb@scfvm.gsfc.nasa.gov -- "Between an idea and implementation, is software." -- Curse from Hubble -- Space Telescope engineer.