Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!sdd.hp.com!spool.mu.edu!news.nd.edu!mentor.cc.purdue.edu!pop.stat.purdue.edu!hrubin From: hrubin@pop.stat.purdue.edu (Herman Rubin) Newsgroups: comp.arch Subject: Re: Compilers, architecture, and efficiency Summary: Some specific answers Message-ID: <11720@mentor.cc.purdue.edu> Date: 1 May 91 19:34:15 GMT Article-I.D.: mentor.11720 References: <1991Apr29.155945.29907@rice.edu> <1991Apr30.025904.13028@rice.edu> <3429@charon.cwi.nl> Sender: news@mentor.cc.purdue.edu Lines: 110 In article <3429@charon.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes: > In article <11609@mentor.cc.purdue.edu> hrubin@pop.stat.purdue.edu (Herman Rubin) writes: ....................... > > One of the reasons why, in the late 50s and early 60s, binary was settled on > > (there are many others) was that multiplication by powers of 2 was rather > > common. > I have heard many reasons, but never this one. Can you give a reference > please? An explicit place where I saw this was in a paper discussing the design of the MANIAC 2. > > Only a few machines have this very important operation in floating > > hardware, and instead have to bring in the multiplier and use the relatively > > slow multiplication. It is true that with pipelined hardware this is not so > > much of a problem, but I know of only a few machines which had it before then. > Mm, some 360's had it I think, but that was completely micro code, and if my > memory is right, later versions did it through a straight multiplication. > There must be some reason. Also, multiplication by 2 is *not* easy on IEEE > conformant machines (think gradual underflow). Kahan had some (good) reasons > to include gradual underflow. But IEEE machines might use the 'adjust exponent' > primitive. On the other hand, IEEE says nothing about what has to be done in > hardware and what can be done in software. And on your favorite machine (Cyber > 205) DP multiplication by 2.0 need not even be exact! On the 360 this would have been difficult, because the machine was base 16 rather than base 2. The machine has a floating instruction to halve. One machine which had this instruction was the CDC 3600. The instruction was ADX, add to exponent, and it managed overflows and underflows exactly the way that they would be for other floating operations. Clearly this can be a fast operation. I did not state that the 205 was a really well-designed machine, just that it is the most versatile vector machine I know. I am even inclined to agree that the virtual memory is not all that great. Not only may DP multiplication by 2.0 fail to be exact, even by 1.0. But compare this to the total nuisance to do DP operations on a CRAY, for example! In fact, it is not common for DP operations to be exact in general on most machines. ......................... > As has been asked before: give examples. How would you like to write in your > favorite language your favorite examples: > 1. Check bit in a bit stream and mark as seen. > 2. Calculate length of a run of 0's or 1's in a bit stream and mark them as > seen. > HINTS: > They could be done along the lines of: > bitstream b; > bit b1; > int count; > /* another gripe of you */ > external end_of_bits(); > trap_at_bits_end(b, end_of_bits); > b1 = nextbit(b); /* I hear you about infix operators */ > count = countbits(b); /* should it count the bit just consumed? */ This is the best I have been able to do for 1. In the envisioned applications, I would not be counting the bits, but transferring on them. Pardon the somewhat clumsier notation, as I am not used to trap pragmas. I am assuming registers are 32 bits. initialize pointers, etc. double int *end_of_bits, *loc; int left; registerpair r; if (left -= 1 < 0){ if(loc >= end_of_bits){ refill(); loc = start_of_bits;} left = 64; r = *loc++;} else r <<= 1; if (r < 0) goto ... For 2, I have not done as well even on scalar machines. On vector machines, allowing collapse of vectors, it can be done fairly easily by having the input bitstream act as a selection for consecutive integers, with appropriate handling of ends. It could be done by counting here, but generally there are faster methods, although they are tricky. I doubt that they would be recognized even if these are. > Next question: how do you envisage an *efficient* hardware implementation? Have hardware hold bitstreams, and have faults on end_of_bits, just as there are now page faults. The operations are not at all difficult from a hardware viewpoint. > If there is one, compilers are quite able to translate the above code to > that implementation. If you know of that *efficient* hardware implementation > and if it is present on a current architecture, just convince your compiler > vendor to put that optimization in your compiler. If not, convince your > hardware vendor to put the operations required in the hardware (and show > how it can be done efficiently), and next convince your compiler vendor. Do you really see a compiler writer recognizing that sequence of code, or one of its alternatives, as this operation? Especially since I, knowing that tests are bad, might use a slightly different code for such situations where I have some control on how many times the operation will be used between tests. This would enable the use of left -= 1; if (r<<=1 < 0) goto ... and the reloading of registers becomes much more complicated. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)