Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!clyde.concordia.ca!uunet!mcsun!unido!tub!fauern!tumuc!guug!pcsbst!me From: me@karl.pcs.com (Michael Elbel) Newsgroups: comp.graphics Subject: Re: Mandelbrot/Julia optimizations (was Re: Excluding Mandelbrot set) Message-ID: Date: 13 Dec 89 17:21:32 GMT References: <7106@ficc.uu.net> <3544@quanta.eng.ohio-state.edu> <234@xochitl.UUCP> <601@otto.bf.rmit.oz.au> Sender: news@pcsbst.UUCP Organization: PCS Computer Systems, GmbH Lines: 56 In-reply-to: athos@otto.bf.rmit.oz.au's message of 1 Dec 89 00:52:42 GMT In article <601@otto.bf.rmit.oz.au> athos@otto.bf.rmit.oz.au (David Burren [Athos]) writes: Call me a hacker, call me obsessive, but optimising for specific architectures (which can take some time with some of the weirder ones) is worth every hour of saved run-time. It takes the exploration of the Mandelbrot and Julia sets out of the realm of batch jobs and into that of user interaction. Even simple HLL code optimizations can make significant improvements. I ended up with a set of C routines for Unix machines, with #define's for various architectural features. The code is well optimised by standard C compilers for HP-9000/300s, Sun-3s, Cyber-180s, VAXen, Multimaxes, Iris 4Ds. There are essentially two versions: one for 2-register instruction machines such as Motorola and NatSemi FPUs, and one for 3-register-instruction beasts such as VAXen and MIPS boxes. The Cyber-170 work produced a function written in COMPASS (assembly) callable from Pascal to calculate single points. The code fits within less than half the 170-760's cache and takes advantage of the multiple functional units for parallelism. It takes just over two minutes to calculate a 1280x1024 image of the full set (-2,-1.25 .. 0.5,1.25) with a dwell limit of 256, and using 60-bit FP numbers. This compares with several hours for the older Pascal programs. I'd like to ask, if there are people out there, who have measured mandelbrot programs on larger screens. We are developing this box with an Intel i860 here at PCS. For a demo we ported a sample program from Intel to the 1280 x 1024 Pixel x 8 bit Framebuffer we have attatched to the beast. The kernel routine to calculate the points is written in assembly language to take advantage of the 'multiple-instructions-per-clock-cycle' mode of the i860. I wonder how the results I got compares to other machines. Here's timing for some interesting areas of the plane. X0 | Y0 | X1 | Y1 | max. depth | time =============================================================================== -2.25 | -1.5 | 0.75 | 1.5 | 156 | 0 min 11.7 sec -0.1992 | 1.0148 | -0.12954 | 1.06707 | 256 | 0 min 8.7 sec -0.713 | 0.49216 | -0.4082 | 0.71429 | 256 | 0 min 21.6 sec -0.75104 | 0.10511 | -0.7408 | 0.11536 | 1024 | 1 min 35.0 sec -0.74758 | 0.10671 | -0.74624 | 0.10779 | 2048 | 2 min 30.0 sec The times include output to the framebuffer, the program was running standalone, the processor was clocked with 33 MHz. Unfortunately I don't have a version for Unix that is at least somewhat optimized (it's faster than this tricky program I have for the ST although there the resolution is only 320 x 200, the depth 30 :-) ). Hoping, Michael -- Michael (X) Elbel - me@dude.PCS.COM for the World, me@dude.PCS.DE for Europe