Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!pyramid!voder!apple!bcase From: bcase@apple.UUCP (Brian Case) Newsgroups: comp.arch Subject: Re: Condition Codes in General Registers Message-ID: <7463@apple.UUCP> Date: 22 Feb 88 19:10:29 GMT References: <6834@sol.ARPA> <780004@otter.hple.hp.com> <17031@watmath.waterloo.edu> Reply-To: bcase@apple.UUCP (Brian Case) Organization: Apple Computer Inc., Cupertino, USA Lines: 49 In article <17031@watmath.waterloo.edu> ccplumb@watmath.waterloo.edu (Colin Plumb) writes: >Well, something similar can be done with result flags in general-purpose >registers. For example, if the Am29000 used -1 for true instead of >80000000, you could do that in three instructions: > >cplt temp, x, #0 ; temp = (x < 0) >xor x, x, temp ; x ^= temp; >sub x, x, temp ; x -= temp > >If temp is 0 after the test, this has no effect on x, but if >it's -1, this inverts x and adds one, i.e. negates it. > >The 29000 doesn't quite do things this way, but isn't there a processor >out there somewhere that does? This brings up an interesting architectural point: Why didn't the 29000 use -1 for true? The answer is two part: (1) it would cost more to force 32 bus lines to one than to force only one and sink the other 31 lines. Since the computation of the one-bit boolean is probably the most critical path in the ALU part of the machine, it seemed silly to increase this path length. (2) Setting all 32 lines to 1 or zero based on the outcome of a comparison gives no more information than setting only one. Thus, the desired -1 or 0 in the cases mentioned here can be created by an arithmetic right shift by 31 places after the boolean is computed. Yes, this adds one cycle to the above sequences, which is significant here, but at least the same *algorithm* can be used with its attendant jump elimination. >Things like z = (x > y) ? a : b; become three instructions: >cpgt z, x, y ; z = (x > y) /* z now holds 0 or -1 */ >and z, z, #a-b ; z &= (a-b) /* z now holds 0 or a-b */ >add z, z, #b ; z += b /* z now holds b or a */ So make the above sequence: cpgt z,x,y sra z,z,31 and z,z,#a-b add z,z,#b (except that if a and b are constants, the 29000 requires more instructions to form them into registers since only 8-bit constants can be named for arithmetic instructions. If a and b are in registers already, then only one additional instruction is needed to form a-b.) >In this case, the ARM can do no better, although it seems that it can >in the general case. As pipelines grow longer, we may see skip >instructions come back. They may "waste" a cycle on a no-op, but they >keep the pipeline full. Skip instructions are already back! See the HP Spectrum. The idea of condition execution (as on ARM) is really just skip in disguise..