Path: utzoo!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!apple!ames!uhccux!munnari.oz.au!bruce!goanna!minyos!otto!athos
From: athos@otto.bf.rmit.oz.au (David Burren [Athos])
Newsgroups: comp.graphics
Subject: Mandelbrot/Julia optimizations (was Re: Excluding Mandelbrot set)
Message-ID: <601@otto.bf.rmit.oz.au>
Date: 1 Dec 89 00:52:42 GMT
References: <7106@ficc.uu.net> <3544@quanta.eng.ohio-state.edu> <234@xochitl.UUCP>
Organization: RMIT, Vic., Australia
Lines: 86

In article <234@xochitl.UUCP>, cheeks@edsr.eds.com (Mark Costlow) writes:
>In article <7106@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes:
>> One thing that can be done to speed up calculations of the mandelbrot set is
>> to use memory. After you have determined that a point is in the set, you know
>> that all the other points visited in determining that are in the set as well,
>> so you can mark them black. Similarly, if you determine a point is not in the
>> set you know that all the other points you visited are not in the set as
>> well, so you can mark them white (or whatever colors you're using). This is
>> a pretty safe optimisation.
> 
> Hmmm ... maybe I'm missing something here, but that doesn't seem to be a 
> valid algorithm.  Since the Mandelbrot algorithm is Z(n+1) = Z(n)^2 + C, then
> each iteration for a given C defines a "path" through the plane, stopping
> at several (possibly infinite) discrete points.  But, isn't the path different
> for each different C?  I mean, just because you saw point P when iterating
> with C1, doesn't mean that if you see point P when iterating on C2 that
> the C2 iteration will behave identically to the C1 iteration.

The term "path" seems to have mislead you a little.  The expression
Z(n+1) = Z(n)^2 + C  says nothing about Z(n-1). If you get to point X from any
previous point then you WILL proceed to the same next point. Of course, this
is subject, like everything else, to rounding errors. The many-to-few property
of the expression is outlined by the fact that complex numbers have two
square roots each.

However the search time to determine if you have found a point previously
makes Peter's scheme sound rather daunting. This would however be dependent
on memory architecture.

My own work on blotting (several years ago someone here at RMIT made a
comparison of a Mandelbrot image to an ink blot, and the name stuck :-) has
largely centered around using parallel hardware and optimising the code for
specific machines. Things like cache make enormous differences to performance.

I try to avoid memory accesses of any sort in the inner per-point loop, keeping
data in registers and instructions in cache. Machines with 8 FP registers are
able to keep all the data on-chip, although brain-dead compilers that refuse to
optimise, or optimise with the proviso that it keeps one register free for
intermediate expression values, often do their best to impede the programs.

People often seem to ignore the performance gains they can achieve with
relatively little effort. Playing with compiler options for a Pascal program
running on a Cyber 170 got the inner loop to sit within the cache (just) with
a performance jump of close to an order of magnitude.
Even algorithms presented in books such as "The Science of Fractal Images" have
(what seems to me now) glaring innefficiencies, calculating parts of
expressions twice, etc.

I've seen many fancy "optimizations" suggested over the years, but although
they reduce the "number of calculations", the "work per calculation" seems
to increase too much to make it worth it.
Comments anyone?

Call me a hacker, call me obsessive, but optimising for specific architectures
(which can take some time with some of the weirder ones) is worth every hour
of saved run-time. It takes the exploration of the Mandelbrot and Julia sets
out of the realm of batch jobs and into that of user interaction.
Even simple HLL code optimizations can make significant improvements.

I ended up with a set of C routines for Unix machines, with #define's for
various architectural features. The code is well optimised by standard C
compilers for HP-9000/300s, Sun-3s, Cyber-180s, VAXen, Multimaxes, Iris 4Ds.
There are essentially two versions: one for 2-register instruction machines
such as Motorola and NatSemi FPUs, and one for 3-register-instruction beasts
such as VAXen and MIPS boxes.

The Cyber-170 work produced a function written in COMPASS (assembly) callable
from Pascal to calculate single points. The code fits within less than half
the 170-760's cache and takes advantage of the multiple functional units for
parallelism. It takes just over two minutes to calculate a 1280x1024 image of
the full set (-2,-1.25 .. 0.5,1.25) with a dwell limit of 256, and using 60-bit
FP numbers. This compares with several hours for the older Pascal programs.

The reason the code was written to be optimal for a wide variety of machines
is because I have an unfinished distributed "fractal server" to use the
coarse-grained parallelism inherent in the Internet.
I intend to rewrite this to use RPC. I saw in comp.sys.sun recently mention
of an RPC-based Mandelbrot program in the 1987 SUG tape. Can anyone here
tell me an easy way to get hold of that?

These programs have been untouched for several months, and are in dire need of
cleaning up, which I intend to do very soon. Anyone interested in the code can
get them from me, although I'd prefer to work on them a bit more first.
_______________________________________________________________________________
David Burren (Athos),			       ACSnet: athos@otto.rmit.oz
Survey Computing Consultants, Hawthorn, Vic.   


Brought to you by Super Global Mega Corp .com