Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!ucbvax!hoptoad!gnu From: gnu@hoptoad.uucp (John Gilmore) Newsgroups: comp.arch Subject: Re: delayed branch Message-ID: <8266@hoptoad.uucp> Date: 11 Aug 89 10:09:48 GMT References: <828@eutrc3.urc.tue.nl> <26667@amdcad.AMD.COM> <26676@amdcad.AMD.COM> Organization: Grasshopper Group in San Francisco Lines: 48 tim@cayman.amd.com (Tim Olson) wrote: > Does anyone else know of other processors with such restrictions? I'm surprised that nobody mentioned the SPARC. It has restrictions on which types of branches can sit in the delay slot of which other types. I think in the first draft of the architecture I was the one who noticed that the intended "return from interrupt" sequence was one of the invalid ones! I don't have my SPARC manual handy but as I recall the invalid combinations are defined to "keep you executing in the same address space but otherwise jump to an undefined location"... I found this quite a botch for a CPU architecture but I'm not a chip designer -- I got into this business via software. Then again, it's been on the market for a few years and nobody seems to be screaming about it. A case where this bit me came up in the observation below: While examining the function block profiler code (cc -a) I noticed an interesting thing. If you do: bcond,a foo [,a means annul] instruction foo: What you have is a "skip on not condition" instruction. If the condition is true, it does a delayed branch to foo, executing the instruction in the delay slot. If the condition is false, it falls thru, but annuls the instruction. In either case, the execution time is the same (two cycles) and you end up at foo. This reminds me of the "skip" instructions on the old DG Nova and Eclipse. Quite nice on machines with a single size instruction. You can also think of it as a "conditionally execute one instruction" instruction; in this case you don't have to mentally reverse the condition. E.g. blt,a foo; insn; foo: executes insn if less than. There's a serious catch to it on the SPARC: the second instruction cannot be a delayed control transfer [i.e. a branch with a delay slot]. If it is, what the CPU does is undefined! I was hoping to use this to shorten the block profiling code, but it doesn't work because the second instruction is a CALL. Still, there are probably places where the optimizers can use it. -- John Gilmore {sun,pacbell,uunet,pyramid}!hoptoad!gnu gnu@toad.com "And if there's danger don't you try to overlook it, Because you knew the job was dangerous when you took it"