Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!cs.utexas.edu!usc!snorkelwacker!spdcc!ima!haddock!news From: news@haddock.ima.isc.com (overhead) Newsgroups: comp.arch Subject: Re: Uses for unusual instructions Message-ID: <15300@haddock.ima.isc.com> Date: 29 Nov 89 21:40:19 GMT References: <1743@l.cc.purdue.edu> <1989Nov27.173555.19673@fxgrp.fx.com> <5446@internal.Apple.COM> Reply-To: suitti@anchovy.UUCP (Stephen Uitti) Organization: Interactive Systems Co Lines: 82 >cik@l.cc.purdue.edu (Herman Rubin) writes: >Soem time ago, I posted to this group certain instructions which are cheap >in hardware and expensive in software. The idea of adding hardware to your machine is practical if you have access to your machine. Putting a device into your ETA-10 (Did Purdue get theirs?) might be difficult. Adding it to your PC clone when you could just use a supercomputer won't probably be a win. Of course if portability is a concern... The examples I've seen so far are finding or counting bits in a word. In assembly: / word bits to count in r0 clr r1 / count = 0 loop: inc r1 / count++ shrr r0 / shift word right, into carry jnc loop / continue if bit is zero / r1 has a bit count This is three instructions per loop. Given a 32 bit word size, it looks like 96 instructions may be executed (more than that if the above loop is executed when all bits are zero). About a decade ago, I was involved with a project that did scheduling. It used a bit per time slot, and there were enough bits in a word to handle the number of time slots required. Packing bits seemed like the data structure because boolean logic let you quickly compute if there were any posibilities remaining. However, finding out which slot was available (which bit was still on) required a loop as above. When the program was run, it tended to spend a large fraction of its time in the loop. The machine, a PDP-10, had an instruction, jffo (jump if find first one), which had strange semantics, but which let us do what we wanted. When the loop was changed to use the instruction, the overall execution speed increased noticably, and the program no longer seemed to spend all its time in the one routine. Was the application of general use? Yes. Was the appliaction portable? No - it was written in assembly. Further, the data structures were highly dependant on word length (36 bits). Could this task be done efficiently without the instruction? Probably. I've since seen some constant time algorythms for bit counting. (Anyone care to post the C macro version?) I haven't run any tests to determine the real speed of these beasts, however. The nice thing about portability is that when the next machine comes out, you can often simply get a speed up for your application. The PDP-10 from nine years ago cost a million dollars. Two years ago, I bought a Mac II for home, which is faster, and cost under $10K. These systems are quite comparible: Dec System 20 Macintosh II RAM 2 Megawords= ~10 MB 5 MB (max 8 MB, for additional $400) Disk 40 Millisec access 20 Millisec access 500 MB 300 MB User Interface Good line oriented Good graphics oriented OS Multi user Single user/multitask Printer Uppercase band printer 300 DPI laser printer 6 pages/minute 6 pages/minute Best utility EMACS MS Word 4.0 (or is 3.02 better?) Best game VT Trek Crystal Quest II User media DECtape 800K floppies Misc 40 terminals two serial ports two graphics screens graphics digitizer sound digitizer stereo sound out MIDI controller Cost $1M $10K Alright, so its apples and orange colored things... The moral? Algorythmic research is important. Portability is often achievable. In fact, maybe it is time to rewrite the scheduler for the Mac. There's probably a market. Stephen. suitti@haddock.ima.isc.com Brought to you by Super Global Mega Corp .com