Path: utzoo!utgpu!water!watmath!clyde!rutgers!cmcl2!husc6!mailrus!umix!uunet!mfci!root From: root@mfci.UUCP (SuperUser) Newsgroups: comp.arch Subject: Re: Performance increase - a suggestion Keywords: VLIW Message-ID: <286@m2.mfci.UUCP> Date: 20 Feb 88 16:42:07 GMT References: <235@unicom.UUCP> <28200088@ccvaxa> <3536@batcomputer.tn.cornell.edu> <231@m2.mfci.UUCP> <1988Feb8.121200.370@mntgfx.mentor.com> Reply-To: odonnell@m3.UUCP (John O'Donnell) Organization: Multiflow Computer Inc., Branford Ct. 06405 Lines: 48 >> >> Under that definition of VLIW, every microcoded machine ever made >> qualifies. It's never been that big a trick to make a machine with >> many parallel functional units. The trick is to be able to keep them >> busy without hand coding, and to provide the right interface so that >> the compiler can express the parallelism that it finds. >> > >1) Contrary to popular impression, FPS has been compiling directly from >FORTRAN to their 64-bit "LIW" 164 machine since the early '80s. The code is >of very high quality, and schedules operations onto the multi-field >instructions across a whole basic block. For example, the memory system >in the FPS 164 is pipelined, and software accounts for the pipeline length, >not a hardware scoreboarder. The FPS compiler made full use of this pipeline, >scheduling appropriately. It also has a very effective loop pipeliner. >In fact, I believe Josh Fisher & Co. used 164 hardware for their early work at >Yale that grew into Multiflow. The above comments are well taken, but the 164 had one basic flaw. The 164 architecture was designed years before anybody contemplated writing a compiler for it. Largely as a result, there are problems with the architecture that make it of limited suitability as a compiler target. For example, the machine has only half the register bandwidth required to support the functional units' requirements for operands and results. If the functional units are to be used at full speed, the pipelined output of one must be used "live" as an input to the other. This causes great difficulty for a scheduler; the machine cannot be modeled as independent functional units, and scheduling cannot proceed based only on data precedence and the availability of the required functional unit. Symptoms of these limitations show up in the wide disparity between performance achieved by FPS' compilers and by hand coders, and the resulting commitment by FPS to the production of large mathematical libraries written in hand code. John Ellis' thesis discusses these issues in greater depth. > >2) Please give us a reference for Ellis' thesis. > John R. Ellis, "Bulldog: A Compiler for VLIW Architectures". 1986. ACM Doctoral Dissertation Award Series, MIT Press, Cambridge, MA. See also Colwell et al., "A VLIW Architecture for a Trace Scheduling Compiler", in Proceedings, ASPLOS-II (Computer Architecture News, OS Review, or SIGPLAN Notices, October 1987) for an update on our architectural approach.