Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!uakari.primate.wisc.edu!aplcen!mef
From: mef@aplcen.apl.jhu.edu (Marty Fraeman)
Newsgroups: comp.lang.forth
Subject: Re: FPGA Forth engines
Message-ID: <1990Dec11.181204.10500@aplcen.apl.jhu.edu>
Date: 11 Dec 90 18:12:04 GMT
References: <9012061501.AA20109@ucbvax.Berkeley.EDU> <1990Dec6.223103.5766@cbnewse.att.com> <1990Dec7.143245.29515@aplcen.apl.jhu.edu> <ADYER.90Dec10180623@milo.wyse.com>
Reply-To: mef@aplcen (Marty Fraeman)
Distribution: na
Organization: Johns Hopkins University
Lines: 54

In article <ADYER.90Dec10180623@milo.wyse.com> adyer@milo.wyse.com (Andrew Dyer x2446) writes:
>I don't think your comments are necessarily true. Several vendors have
>arrays with approx. 2000 2-input NAND-equivalent gates, which will run

Well lets see now.  Both the SC32 and RTX2000 family basically have
three separate address spaces that can be accessed each cycle:  main
memory for instructions and data, parameter stack memory, and data
stack memory.  My belief is that this is the key feature needed to make
a high speed Forth engine.  Both Koopman and Hayes have shown that the
stack memories should be at least 16 words deep before overflow
mechanism overhead becomes negligible.  So a 16 bit machine should have
at least 2*16*16 bits of memory tightly coupled to the CPU.  A single
bit of memory takes at least 2 2-input NAND gates, about 1K gates
total, just for the stacks or at least half of your FPGA.  If you take
the stacks off the FPGA and put them in static ram like the Novix chip
did then you take a big speed hit.  For proof look at the top speed of
the Novix vs the RTX2000.  Both were using around 2u technology
(although the Novix was a gate array and the RTX is a standard cell)
yet the the RTX is more than twice as fast.

>at toggle rates of 70 MHz.(Xilinx and AMD) I wouldn't trust I/O rates
>to be more than 50 Mhz tho. Assuming a 50 MHz clock, and 6 clock
>cycles/instruction you get 8.33 MHz cycle rate.  Not too shabby.
>
Yes, for one flip flop maybe, but what happens when you finish routing
a real circuit?

>The other problem is that FPGAs are expensive, and it would take
>several of them if they were the only components. For a one shot
>system that's o.k., but if it's to be ``public domain'' hardware, then
>it should be a bit simpler (IMHO).
>
>Rather than FPGA's exclusively, I would be inclined to use a mixture
>of LSI type parts like register files, dual port memories, ALU's and
>some FPGA logic for ``glue''.
>
>If one chose the correct parts, the design could be easily migrated to
>a standard cell or gate array library. (2900 series bit slice
>components, for example, are available from at least one vendor.)
Yes you could do this and Phil Koopman already did.  In fact Phil
migrated his WISC 32 from TTL to a standard cell design while at Harris.
Perhaps he could comment on performance of the discrete vs integrated
implementation.


	Marty Fraeman

	mef@aplcen.apl.jhu.edu
	301-953-5000, x8360

	Room 13-s587
	Johns Hopkins University/Applied Physics Laboratory
	Johns Hopkins Road
	Laurel, Md. 20723