Path: utzoo!utgpu!news-server.csri.toronto.edu!mailrus!cs.utexas.edu!sdd.hp.com!uakari.primate.wisc.edu!aplcen!haven!udel!rochester!pt.cs.cmu.edu!a.gp.cs.cmu.edu!koopman From: koopman@a.gp.cs.cmu.edu (Philip Koopman) Newsgroups: comp.lang.forth Subject: Re: Floating point stack Message-ID: <10343@pt.cs.cmu.edu> Date: 29 Aug 90 14:43:02 GMT References: <9008290355.AA19589@ucbvax.Berkeley.EDU> Organization: Carnegie-Mellon University, CS/RI Lines: 97 In article <9008290355.AA19589@ucbvax.Berkeley.EDU>, ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) writes: > How common is the unified stack, I wonder? It depends on what you mean by common. Since most Forth programmers these days probably run on systems with 80287 hardware stacks (or 80287 emulator software), the answer may be uncommon. My personal experience has been almost uniformly with unified stacks. The fact that the ANS Forth folks passed the resolution without my being at the meeting suggests that they now believe it is common enough to warrant consideration. > On many platforms in common use (VAX, 68000), there is an on-chip > register available. The lack of one on one current generation of Forth > chip should hamstring portability of floating point software on all? > I think I can safely say that the vast majority of current working FP > code is implemented on conventional CPUs. Not all conventional CPUs have an available (or at least convenient) register for the FP pointer. The 80x86 could use BX or perhaps DI, but requires an SS: override instruction for non-tiny memory models. First-generation micros don't have a spare register (6502, 8080/Z80, 6800, 68HC11?). You may think that small/old micros don't matter much, but other folks (especially in embedded control) do. > Are there any stack-based machines with on-chip floating point capability; > floating point that executes basic instructions such as F+ and F* in a single > cycle? No announced products that I know of. But, they will clearly come some day. > If so, I can reluctantly accept your arguments (maybe). If not, > and they are in the planning stage only, what is preventing the inclusion > of an FP stack register and associated stack memory? The memory takes a lot of chip area, which makes chips more expensive. It also represents more context to be saved for context switches, thus degrading real time performance. Even a separate pointer register is that much more context to save (especially since on a many systems there will probably be limit registers as well). The realities of the marketplace are that C performance will drive most design decisions (not ANS Forth compliance). Stack architectures will have a separate FP stack on-chip only if they *significantly* help C run times. I believe that this will be true of companies besides Harris in the future. > Given the parallel > nature of stack-based machine architecture, I would think -- as a complete > novice when it comes to hardware design -- that it would be faster to perform > an FP addition and memory store operation in a single cycle, than to > perform an FP addition in one cycle, a swap in one cycle, and a store in > one cycle. Inexpensive memory is slower than floating point operations. The major limit to supercomputers these days is not fast floating point, but rather memory bandwidth. I'll trade stack twiddling for fetches and stores any day. Perhaps this is not optimal on current hardware, but it is the wise long-term path. As CPU speeds increase, the importance of reducing demands on memory will increase in importance too. > To reiterate, is Harris floating point software founded on hardware FP > operations or software "emulation" of floating point. Both (at least in the planning stages). > If the latter, I take as my example the lowly 6502, for which I have > implemented a set of software floating point routines. I found that amongst > all of the bit-manipulation gymnastics, accessing a memory variable to get > the FP stack pointer falls into the noise level when counting cycles. How about FDUP, FSWAP, etc.? Here you are paying a proportionally larger penalty for memory-based pointer manipulations. > From my own experience, the separate stack becomes an advantage, or at > the very least, no disadvantage. Coupled with the much greater ease in > writing code, I am a stone-cold-separate-floating-stack advocate. > Perhaps I should stand before the TC to make my own "empassioned plea". > Nah, it's already too late. I didn't consider it an "empassioned plea" myself. Just a statement of fact. Harris (and other stack machine vendors, to the best of my knowledge) don't plan on supporting a separate hardware floating point stack. The other reasons I gave in my previous post were what I perceive as the pro-unified stack point of view. It is up to the TC to sort out the facts and reach a wise decision. > Oh, I thought we were talking about *Forth*. Seriously, your point > about how a C compiler would implement FP operations is well taken. Harris' > success could well be based on the efficiency of a C compiler on your chip. > But you admit that the jury is still out. This is a direct conflict with > the common-practice argument. No conflict at all. The common practice argument should be restricted to Forth. The jury is out on C. I have no performance numbers, and am therefore unwilling to preclude future use of a unified stack (OR, a split stack). The issue is whether the TC wants to take into account the very likely possibility (based on size, memory, and context switching considerations) that unified stacks will be significantly more efficient on future stack machines. We all know that many Forth programmers value efficiency above conformance to a standard. Phil Koopman koopman@greyhound.ece.cmu.edu Arpanet 2525A Wexford Run Rd. Wexford, PA 15090 Senior scientist at Harris Semiconductor, and adjunct professor at CMU. I don't speak for them, and they don't speak for me.