Path: utzoo!utgpu!news-server.csri.toronto.edu!rutgers!cs.utexas.edu!uunet!timbuk!cs.umn.edu!uc!noc.MR.NET!gacvx2.gac.edu!gacvx2.gac.edu!scott
From: scott@mcs-server.gac.edu (Scott Hess)
Newsgroups: comp.sys.next
Subject: Re: Postscript Imaging Speed w/040
Message-ID: <SCOTT.91Jan11143139@mcs-server.gac.edu>
Date: 11 Jan 91 20:31:39 GMT
References: <1991Jan11.191116.2685@noose.ecn.purdue.edu>
Distribution: usa
Organization: Gustavus Adolphus College
Lines: 83
Nntp-Posting-Host: mcs-server.gac.edu
In-reply-to: songer@orchestra.ecn.purdue.edu's message of 11 Jan 91 19:11:16 GMTLines: 83

In article <1991Jan11.191116.2685@noose.ecn.purdue.edu> songer@orchestra.ecn.purdue.edu (Christopher M Songer) writes:
	I was wondering is anyone could speak to the effects of the 040
   on Postscript imaging. I'm sure it will speed it. I'm curious by how
   much. I currently have a PS program which takes about 34500 ms. to run.
   (Assuming Yap is telling me the truth.) The program is likely to
   grow and I'm curious how much faster it is going to be on an 040. (If,
   after the 040 shift, it takes much more than 10 seconds to image, it'll 
   have to be optimized....) Anyway... if anyone knows what the speed up 
   factor is, I'd be interested to know. 

Now _here's_ a question :-).  This is probably one of the tougher questions
to answer, because, like the benchmark debates, there is _no_ set answer.
You have to try out what you're doing to find out the speedup.

That out of the way, I'll give some practical impressions . . .

One of my tests for Stuart speed is to do time cat /etc/termcap.  If the
machine isn't running much else, and make sure it's not, this will give
fairly consistent results on a machine.  On the '030 machines, this
takes about 43 seconds (you're mileage may vary) in a 80x72 window
using Ohlfs-10.  Even though the shipping 2.0 changes Ohlfs just enough
that you can only get 66 lines on the screen (something I'll fix RSN),
I was lucky enough to get in some tests on a pre-release 2.0 without
that problem, running on a NextStation.  The time: 21 seconds.

Now, for disclaimers.  Another test was to run the same command again,
this time with the scrollbar pushed to the top of the document.
Stuart has optimizations to simply skip all drawing in this case.
The old machines then took 7 seconds.  The new ones take an amazing
2 seconds.  Given the timing errors possible with this method, that's
anywhere from 2 to 3 time speedup on the calculations without display.

Explainations.

The internal Stuart stuff is made up of mostly integer calculations,
with a fair amount of overhead for system calls and the like.  This
means that, though it gets quite a speedup, it's not as great as
a calculation-only programs would be.  Programs with long/tight
(long as in 1 million executions, tight as in small code-size) will
achieve a greater speedup, as measured by many people on the net.
Programs using floating point will, too.  But, programs with lots
of trig functions, while achieving good speedup, will not be so great
(because of changes in how they are calculated on-chip, to save space).

The display stuff in Stuart, meanwhile, exercises three things:
composite moves, rectangle clears, and text drawing.  While I
spent alot of time optimizing the calls for order and compactness,
these still are probably some of the best tests of the windowserver.
See, they don't do anything that is "hard".  Bit blasts, rectangle
clear, and text drawing all are going to be heavily dependant on
the microprocessor, but won't get a whole lot of advantage from
the floating point improvements, relying more on byte moves.
So, I'd suspect that a program which does drawing of diagrams and
the like will achieve a more than 2x speedup, though I've not
fully tested this.

So far as I'm aware, the NextStep2.0 windowserver has not been
improved in raw speed.  Running both 1.0 and 2.0 on the same
machine (well, different times) indicates that, though many
calls to the windowserver are rearranged for speed (invisibly,
in the appkit), the windowserver itself is not faster.  I think
this will be true pretty much across-the-board.  So, as always,
it is much more important to arrange the code correctly than
to upgrade CPUs (the changes from 1.0- Stuart to 2.0 Stuart gave
more speed than the upgrade from 2.0 Stuart on '030 to 2.0
Stuart on '040).

A corallary to the apparent windowserver speedup (2x plus maybe a little
more) as compared to the apparent application-side speedup (3x plus probably
a little more for calculation intensive apps not using trigs) is
that moving code from the postscript side of you program to the C
side is more important on and '040, especially such constructs
as ifelse and for.

If you have any questions about any of this, feel free to drop
me a line.  My shop is always open . . .
--
scott hess                      scott@gac.edu
Independent NeXT Developer	GAC Undergrad
<I still speak for nobody>
"Tried anarchy, once.  Found it had too many constraints . . ."
"Buy `Sweat 'n wit '2 Live Crew'`, a new weight loss program by
Richard Simmons . . ."