Path: utzoo!attcan!uunet!fernwood!apple!usc!zaphod.mps.ohio-state.edu!ub!rutgers!gatech!purdue!mentor.cc.purdue.edu!l.cc.purdue.edu!cik From: cik@l.cc.purdue.edu (Herman Rubin) Newsgroups: comp.arch Subject: Re: Compilers taking advantage of architectural enhancements Message-ID: <2709@l.cc.purdue.edu> Date: 6 Nov 90 12:51:26 GMT References: <1990Oct9> <3300194@m.cs.uiuc.edu> <11922@ganymede.inmos.co.uk> <1990Oct31.203932.26325@cs.cmu.edu> Organization: Purdue University Statistics Department Lines: 88 In article <1990Oct31.203932.26325@cs.cmu.edu>, spot@WOOZLE.GRAPHICS.CS.CMU.EDU (Scott Draves) writes: > cik@l.cc.purdue.edu (Herman Rubin) writes > |> In many cases, the operation is very simple, not complex. Do not > assume that > |> the current operations on the computers are even an attempt at the logically > |> basic ones. This was the case in the early computers, when those designing > |> them had a good understanding of mathematics. It is not the case now. > > what a completely bigoted and infantile statement. your true colors are > showing. > so tell us, what are the "logically basic operations"? probably those > you wish you had the last time you wrote some code, right? > > |> > |> [ a few things herman wishes were in hardware ] > |> > |> In some cases, the difference would not be between 32 and 75 cycles, but > |> between 2 and 75. > > name it. and then show how it can be implemented on a modern processor. > and then show that those transistors couldn't have been better spent on > something else. keep in mind the application mix that uProcs are > actually used for. Where did I confine myself to microprocessors? And what are microprocessors not used for? Bob Silverman was complaining about the lack of 32x32 -> 64 multiplication; he was using microprocessors as part of a coordinated effort to factor large numbers. Today's microprocessors can outperform the mainframes of not too many years ago. A modern processor is what the brightest designers and engineers of today can put together. Our students and faculty are expected to do most of their computer work on our departmental machines as far as possible; except for an old mini, being replaced, they are usually considered microprocessors. Some of these involve computer simulations which could not be afforded as recently as 20 years ago. As to whether the transistors could not be used better, another chip would not be all that expensive. In fact, it would be cheap. Now for some instructions. There are algorithms, doing all their computations on a few bits (usually < 10) which use these instructions. As these are algorithms for generating non-uniform random variables, they are likely to be used thoudands or millions of times. Find the distance to the next 1 in a bit stream. Place the stream pointer past that 1. It is necessary to take into account that the stream could be exhausted before a 1 is found. Current solution, and even this is generally not available: Keep as many bits as possible for the subsequent operations in registers. Keep needed counts and pointers in registers. Find next 1, if there is one. Do appropriate shifts and count updates. Put count in desired register If there is not one, store number of 0's reload: Check if there are items left in the memory buffer. If not, refill and update pointers Get information into registers and update pointers. Find next 1, if there is one If not, augment 0 count by number of 0's and go to reload Do appropriate shifts and count updates Put count plus number of 0's in desired register Now I can do better on some vector machines. But even there the time needed to produce a vector of these is 3 times the "hardware" time. A problem which may even be worse in the same context is to do a conditional branch on a bit, and move the pointer. Another wanted instruction on vector machines is to replace the 0's in a bit stream with bits from another stream, i.e., the k-th 0 is to be replaced by the k-th bit. The estimated vector time on a CYBER 205 of the best method I have come up with is about 40 times what the string operation would take if it were in hardware, and would take 4 startups instead of 1. Other things, such as reading buffers, etc., have already been mentioned. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!hrubin(UUCP)