Path: utzoo!utgpu!watserv1!watmath!att!pacbell!pacbell.com!ucsd!usc!snorkelwacker!husc6!husc9!cleary From: cleary@husc9.harvard.edu (Kenneth Cleary) Newsgroups: comp.arch Subject: What about hardware implementations of optimized algorithms? Summary: As opposed to optimized circuits. Message-ID: <4047@husc6.harvard.edu> Date: 31 Aug 90 03:12:38 GMT Sender: news@husc6.harvard.edu Reply-To: cleary@husc9.UUCP (Kenneth Cleary) Followup-To: comp.arch Organization: Harvard University Science Center Cambridge, MA Lines: 46 While I find discussions of hardware optimizations achieved through improvement of the electronic circuitry (such as GaAs, and higher-density VLSI) and design improvements like pipelining or instruction caching, I would like to hear about examples of optimized algorithms implemented as digital circuits. It is not difficult to find a few books on optimized algorithms implemented as software, but I'm having difficulty finding out about hardware aspects. (I'm not specifically referring to digital circuit optimizations like gate- count reduction, though that also interests me.) Most hardware optimizat- ions I've seen assume a fixed algorithm which is to be implemented as efficiently as possible, rather than looking at desired input and desired output, and attempting to find the most efficient algorithm. SOFTWARE CASE IN POINT: Donald E. Knuth, The Art of Computer Programming, vol.2, Seminumerical Algorithms page 278, section 4.3.3. How Fast Can We Multiply? Knuth talks about getting faster multiplication than order-n**2 behavior. He describes some methods of getting behavior of n**(lg3) or n**1.585 but this only starts to provide a signifigant boost as the size of the numbers being multiplied becomes quite large. (i.e. it might not be worth it for 8-bit & 16-bit numbers, but may be competitive for 32-bit & 64-bit.) HEAVY PARAPHRASING OF KNUTH BEGINS... If we have two n-bit [let's say 32-bit] numbers {underscores for subscripts} u=(u_31,u_30,...,u_1,u_0) and v=(v_31,v_30,...,v_1,v_0) {individual bits} we can write u=(2**16)U_1 + U_0 v=(2**16)V_1 + V_0 basically breaking each number into its most-sig half & least-sig half, so uv= ((2**32)+(2**16))(U_1)(V_1) + (2**16)((U_1)-(U_0))((V_0)-(V_1))+((2**16)+1)(U_0)(V_0) "This formula reduces the problem of multiplying [32-bit] numbers to three multiplications of [16-bit] numbers... plus some simple shifting and adding operations." This breaking up into two parts is done recursively, breaking the number into smaller and smaller chunks. {reminds me of quicksort} PARAPHRASING ENDS. {My apologies for typos} So, while there may not have been demand for this sort of approach with smaller 8-bit chunks, it may provide so much of a boost at 64-bits, as to make it more even more worthwhile to build 64-bit machines for handling 64-bit numbers, instead of chopping them into two 32-bit chunks to be handled by a 32-bit processor. (wait a minute... "breaking into two chunks" sounds familiar...) Oh Well! It's late, and I think I should stop typing right n.............