Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!timbuk!cs.umn.edu!uc!noc.MR.NET!gacvx2.gac.edu!gacvx2.gac.edu!scott From: scott@mcs-server.gac.edu (Scott Hess) Newsgroups: comp.sys.next Subject: Re: Postscript Imaging Speed w/040 Message-ID: Date: 11 Jan 91 20:31:39 GMT References: <1991Jan11.191116.2685@noose.ecn.purdue.edu> Distribution: usa Organization: Gustavus Adolphus College Lines: 83 Nntp-Posting-Host: mcs-server.gac.edu In-reply-to: songer@orchestra.ecn.purdue.edu's message of 11 Jan 91 19:11:16 GMTLines: 83 In article <1991Jan11.191116.2685@noose.ecn.purdue.edu> songer@orchestra.ecn.purdue.edu (Christopher M Songer) writes: I was wondering is anyone could speak to the effects of the 040 on Postscript imaging. I'm sure it will speed it. I'm curious by how much. I currently have a PS program which takes about 34500 ms. to run. (Assuming Yap is telling me the truth.) The program is likely to grow and I'm curious how much faster it is going to be on an 040. (If, after the 040 shift, it takes much more than 10 seconds to image, it'll have to be optimized....) Anyway... if anyone knows what the speed up factor is, I'd be interested to know. Now _here's_ a question :-). This is probably one of the tougher questions to answer, because, like the benchmark debates, there is _no_ set answer. You have to try out what you're doing to find out the speedup. That out of the way, I'll give some practical impressions . . . One of my tests for Stuart speed is to do time cat /etc/termcap. If the machine isn't running much else, and make sure it's not, this will give fairly consistent results on a machine. On the '030 machines, this takes about 43 seconds (you're mileage may vary) in a 80x72 window using Ohlfs-10. Even though the shipping 2.0 changes Ohlfs just enough that you can only get 66 lines on the screen (something I'll fix RSN), I was lucky enough to get in some tests on a pre-release 2.0 without that problem, running on a NextStation. The time: 21 seconds. Now, for disclaimers. Another test was to run the same command again, this time with the scrollbar pushed to the top of the document. Stuart has optimizations to simply skip all drawing in this case. The old machines then took 7 seconds. The new ones take an amazing 2 seconds. Given the timing errors possible with this method, that's anywhere from 2 to 3 time speedup on the calculations without display. Explainations. The internal Stuart stuff is made up of mostly integer calculations, with a fair amount of overhead for system calls and the like. This means that, though it gets quite a speedup, it's not as great as a calculation-only programs would be. Programs with long/tight (long as in 1 million executions, tight as in small code-size) will achieve a greater speedup, as measured by many people on the net. Programs using floating point will, too. But, programs with lots of trig functions, while achieving good speedup, will not be so great (because of changes in how they are calculated on-chip, to save space). The display stuff in Stuart, meanwhile, exercises three things: composite moves, rectangle clears, and text drawing. While I spent alot of time optimizing the calls for order and compactness, these still are probably some of the best tests of the windowserver. See, they don't do anything that is "hard". Bit blasts, rectangle clear, and text drawing all are going to be heavily dependant on the microprocessor, but won't get a whole lot of advantage from the floating point improvements, relying more on byte moves. So, I'd suspect that a program which does drawing of diagrams and the like will achieve a more than 2x speedup, though I've not fully tested this. So far as I'm aware, the NextStep2.0 windowserver has not been improved in raw speed. Running both 1.0 and 2.0 on the same machine (well, different times) indicates that, though many calls to the windowserver are rearranged for speed (invisibly, in the appkit), the windowserver itself is not faster. I think this will be true pretty much across-the-board. So, as always, it is much more important to arrange the code correctly than to upgrade CPUs (the changes from 1.0- Stuart to 2.0 Stuart gave more speed than the upgrade from 2.0 Stuart on '030 to 2.0 Stuart on '040). A corallary to the apparent windowserver speedup (2x plus maybe a little more) as compared to the apparent application-side speedup (3x plus probably a little more for calculation intensive apps not using trigs) is that moving code from the postscript side of you program to the C side is more important on and '040, especially such constructs as ifelse and for. If you have any questions about any of this, feel free to drop me a line. My shop is always open . . . -- scott hess scott@gac.edu Independent NeXT Developer GAC Undergrad "Tried anarchy, once. Found it had too many constraints . . ." "Buy `Sweat 'n wit '2 Live Crew'`, a new weight loss program by Richard Simmons . . ."