Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!uunet!auspex!guy
From: guy@auspex.auspex.com (Guy Harris)
Newsgroups: comp.arch
Subject: Re: Compilers and efficiency
Message-ID: <7184@auspex.auspex.com>
Date: 15 Apr 91 23:44:25 GMT
References: <9782@mentor.cc.purdue.edu> <7117@auspex.auspex.com> <1406@ncis.tis.llnl.gov>
Organization: Auspex Systems, Santa Clara
Lines: 39

>Guy Harris's example (if I remember correctly) was that no language he
>knew of had semantics for retrieving both the integer quotient and the
>remainder from a floating divide, the point being that both values are
>usually available from any implementation of floating point divide, but
>that the language stands in the way of getting them at the same time.

I don't think you remember correctly; if *I* remember correctly, I
didn't give any particular example, and I certainly didn't give *that*
example.  *Herman Rubin* made the claim that no language he knew of had
those semantics, and at least two other people jumped in to claim that
Common LISP *did* have those semantics.

I was mainly thinking of, indeed, such things as the added addressing
modes; they may increase code density, but do they complicate instruction
decoding and slow the machine down there?  And what about some of the
more elaborate procedure-calling, procedure-entry, or procedure-exit
instructions?

Admittedly, to some extent, the problems with the more elaborate
features may come either from 1) sloppy implementations of them and
2) using the elaborate features even when inappropriate, e.g. not
making use of simplifications that can be done at code-generation time,
such as better management of registers.  Both seem to come, at least in
part, from the notion that "well, it's *one instruction*, that means it
*has* to be fast!" - i.e., since it's a single instruction, they didn't
worry about making it fast, or making the common case fast, and/or
assumed it was *always* the right thing to do to use that instruction. 

As an example of 1), the CCI Power 6/32, as I remember, did a lot better
job at implementing a VAX-like CALLS instruction than did the
VAX-11/780; one trick I remember them doing was to generate the fields
of the stack frame in order, so that the stores that built the stack
frame would work well with interleaved memory.  They also stored
*decoded* instructions, rather than code bytes, in the instruction
cache, as a way of avoiding the overhead of decoding VAX-style
instructions.

As an example of 2), consider, say, treating leaf procedures differently
- can you get away with doing less than a full procedure entry?