Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!unido!tub!fauern!tumuc!guug!pcsbst!me
From: me@karl.pcs.com (Michael Elbel)
Newsgroups: comp.graphics
Subject: Re: Mandelbrot/Julia optimizations (was Re: Excluding Mandelbrot set)
Message-ID: <ME.89Dec13172132@karl.pcs.com>
Date: 13 Dec 89 17:21:32 GMT
References: <7106@ficc.uu.net> <3544@quanta.eng.ohio-state.edu>
	<234@xochitl.UUCP> <601@otto.bf.rmit.oz.au>
Sender: news@pcsbst.UUCP
Organization: PCS Computer Systems, GmbH
Lines: 56
In-reply-to: athos@otto.bf.rmit.oz.au's message of 1 Dec 89 00:52:42 GMT

In article <601@otto.bf.rmit.oz.au> athos@otto.bf.rmit.oz.au (David Burren [Athos]) writes:

   Call me a hacker, call me obsessive, but optimising for specific architectures
   (which can take some time with some of the weirder ones) is worth every hour
   of saved run-time. It takes the exploration of the Mandelbrot and Julia sets
   out of the realm of batch jobs and into that of user interaction.
   Even simple HLL code optimizations can make significant improvements.

   I ended up with a set of C routines for Unix machines, with #define's for
   various architectural features. The code is well optimised by standard C
   compilers for HP-9000/300s, Sun-3s, Cyber-180s, VAXen, Multimaxes, Iris 4Ds.
   There are essentially two versions: one for 2-register instruction machines
   such as Motorola and NatSemi FPUs, and one for 3-register-instruction beasts
   such as VAXen and MIPS boxes.

   The Cyber-170 work produced a function written in COMPASS (assembly) callable
   from Pascal to calculate single points. The code fits within less than half
   the 170-760's cache and takes advantage of the multiple functional units for
   parallelism. It takes just over two minutes to calculate a 1280x1024 image of
   the full set (-2,-1.25 .. 0.5,1.25) with a dwell limit of 256, and using 60-bit
   FP numbers. This compares with several hours for the older Pascal programs.

I'd like to ask, if there are people out there, who have measured mandelbrot
programs on larger screens.

We are developing this box with an Intel i860 here at PCS. For a demo we
ported a sample program from Intel to the 1280 x 1024 Pixel x 8 bit
Framebuffer we have attatched to the beast.

The kernel routine to calculate the points is written in assembly language
to take advantage of the 'multiple-instructions-per-clock-cycle' mode of
the i860.

I wonder how the results I got compares to other machines.

Here's timing for some interesting areas of the plane.


    X0     |    Y0     |    X1     |    Y1     | max. depth |   time
===============================================================================
 -2.25     | -1.5      |  0.75     |  1.5      |    156     | 0 min 11.7 sec
 -0.1992   |  1.0148   | -0.12954  |  1.06707  |    256	    | 0 min  8.7 sec
 -0.713    |  0.49216  | -0.4082   |  0.71429  |    256     | 0 min 21.6 sec
 -0.75104  |  0.10511  | -0.7408   |  0.11536  |   1024     | 1 min 35.0 sec
 -0.74758  |  0.10671  | -0.74624  |  0.10779  |   2048     | 2 min 30.0 sec

The times include output to the framebuffer, the program was running 
standalone, the processor was clocked with 33 MHz.

Unfortunately I don't have a version for Unix that is at least somewhat 
optimized (it's faster than this tricky program I have for the ST although
there the resolution is only 320 x 200, the depth 30 :-) ).

Hoping, Michael
--
Michael (X) Elbel - me@dude.PCS.COM for the World, me@dude.PCS.DE for Europe