Path: utzoo!utgpu!utstat!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!pasteur!columbine.Berkeley.EDU!dean
From: dean@columbine.Berkeley.EDU (R. Drew Dean)
Newsgroups: comp.arch
Subject: Re: Complex Instructions
Message-ID: <13396@pasteur.Berkeley.EDU>
Date: 7 May 89 06:22:48 GMT
References: <57252@yale-celray.yale.UUCP> <4101@tolerant.UUCP> <134@dg.dg.com> <2253@pembina.UUCP> <1287@l.cc.purdue.edu>
Sender: news@pasteur.Berkeley.EDU
Reply-To: dean@columbine.Berkeley.EDU (R. Drew Dean)
Organization: University of California at Berkeley
Lines: 43

In article <1287@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>I agree.  But semi-portable and close to efficient is not impossible.  You 
>may have to abandon an algorithm completely on some machines.  In some cases
>the algorithm may have to be entirely abandoned.  This is especially the case
>in generating random variables, where it is the method which must be correct,
>NOT the answer.  The CRAY-1 is so bad I consider it a basket case for
>vectorization.  I would suggest connecting a mini, or even a micro, to
>do this part of the job.  The algorithm I would use on the 205 differs 
>from what I would use on the IBM 3090; the 205 algorithm could be implemented
>on the 3090, but would be decidedly more costly.  Implementing the 3090
>algorithm on the 205 would require breaking up the vector into short pieces,
>which is required on vector register machines such as the 3090 and CRAY, but
>is not required on vector stream machines such as the 205.  But to me, these
>considerations are immediate and obvious.
>
>-- 
>Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907

Can you please provide examples of the different algorithms you use on these
different machines ?
I refer you to the ACM Turing Lecture book, where I paraphrase from (I forget
whose lecture this was, the book is in the library where I'm not, but if
my memory hasn't dropped too many refresh cycles it's McCarthy's...)
	Look at generating Fibonacci numbers.  The obvious algorithm
	F(n) = F(n-1) + F(n-2) generates a process with exponential
	running time, but this can be transformed into a process with
	linear running time...[new algorithm omitted]  I don't care
	how great your "optimizing" compiler is -- a compiler which
	sees this transformation (reducing the order of growth), generating
	not necessarily perfect code, will trounce your compiler....
Note: Ok, this would be a really difficult compiler to write....I don't know
of anybody doing it yet.  But it's the principle that matters here:
It's _not_ how micro-efficient your code is that matters.  If it's
too slow, rethink the algorithm.  That's where the real difference is --
if you reduce the order of growth, wonderful things start happening.

If your random variable generator uses an algorithm that executes in the
least known time, then please forgive this posting....

Drew Dean
Internet: dean@xcssun.berkeley.edu
UUCP: ...!ucbvax!xcssun!dean
FROM Disclaimers IMPORT StandardDisclaimer;