Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site rochester.UUCP
Path: utzoo!watmath!clyde!cbosgd!ihnp4!qantel!dual!lll-crg!gymble!umcp-cs!seismo!rochester!mayer
From: mayer@rochester.UUCP (Jim Mayer)
Newsgroups: net.arch
Subject: Re: What I miss in micro-processors (fairly long)
Message-ID: <11462@rochester.UUCP>
Date: Sun, 8-Sep-85 11:59:22 EDT
Article-I.D.: rocheste.11462
Posted: Sun Sep  8 11:59:22 1985
Date-Received: Wed, 11-Sep-85 06:09:09 EDT
References: <796@kuling.UUCP> <1713@orca.UUCP>
Reply-To: mayer@rochester.UUCP (Jim Mayer)
Organization: U. of Rochester, CS Dept.
Lines: 86
Keywords: compilers, optimization, procedures
Summary: Fast calling sequences, inline expansion.

In article <1713@orca.UUCP> jans@orca.UUCP (Jan Steinman) writes:
>In article <796@kuling.UUCP> grzm@kuling.UUCP (Gunnar Blomberg/ADB) writes:
>>... there are a couple of features I've been missing rather badly...
>>...
>>2. A quick subroutine call.
>>
>>  What I would really like to see here is something like the PDP-10 JSP
>>instruction, something that jumps to a subroutine and puts the return
>>address in a register instead of on the stack.
>
>I hope you never try anything recursive this way!  Current popular recursive-
>capable languages don't have any way to guarantee a given routine will never
>be used recursively, so compilers for such languages (Pascal, C, Ada, etc.)
>would never use such an instruction.  As for non-recursive assembly code, I
>would look carefully at re-designing your entire calling scheme.  If time-
>optimal, macrofy and in-line the calls (with the added advantage of being able
>to use arbitrary registers).  If stack space-optimal, write a macro that
>sticks the PC in a register, then jumps.
>
>Guess what I'm really trying to say is that I don't see enough use for either
>of these things to justify the effort needed to put them in.
>....
I would also like to see a simple, register based, calling sequence on
microprocessors.  Here are my reasons:

1. Simple compilers for most languages usually start off procedures
   by saving a bunch of registers, and finish by restoring the same
   set.  The trend for CISC machines (vax, 68000, etc) is to provide
   some variant on the "push multiple registers"/"save according to
   register mask" theme.  Given this kind of prelude, it doesn't really
   matter whether the PC is pushed on the stack at the JSR, or at the
   MOVEM.  Recursion still works fine.  And since simple compilers
   usually don't do to well at allocating registers across procedure
   calls, the cost of the tied up register is not to great either.
   This assumes, of course, a register intensive architecture.

2. It is quite easy to detect a large class of routines in ANY LANGUAGE
   that are not called recursively.  In the most general case, just
   build a call graph and look for cycles.  Given a more traditional
   compile&link compiler architecture it would still pay to look for
   routines that don't call any other functions (or any functions
   outside of the current module/source file).

3. Finally, to throw a new twist into the time optimal/space optimal
   tradeoff, consider the effect of compiler generated specialized
   calling sequences.  Given some run time statistics a compiler can
   identify calls that would benefit from inline or specialized calling
   sequences.  The same techniques can be used to propagate information
   about register usage.  In fact, one can view inline expansion as
   nothing more than one extreme of calling sequence optimization.  The
   optimizations are driven by execution time profiling information and
   estimates of the potential for further traditional optimizations
   constrained by a space bound.  For more information see Gene Ball's
   Ph.D.  dissertation:

	John Eugene Ball
	Program Improvement by the Selective Integration of Procedure Calls
	Ph.D. Thesis
	University of Rochester, 1982.

   Also, for a simpler approach that only handles inline expansion and
   does not consider the effects of further optimizations see:

	Robert W. Scheifler
	An Analysis of Inline Substitution for a Structured Programming Language
	CACM, Volume 20, Number 9
	September 1977, pages 647-654.

4. I guess that where I differ from the original poster is that I don't
   want a stack based calling sequence at all!  Stack based calling
   sequences on machines with lots of registers seem to me like a
   perfect example of a premature optimization.  My general feeling is
   that this kind of "low level" high level language support (eg. stack
   based calls, index instructions, bounds checking) is usually not
   worth adding since an optimizing compiler can do a better job of
   speeding things up anyway.  On the other hand, specialized high
   level microcode/hardware support for things like checksums, context
   switching, message passing, and FFT's can be big wins.  I'm
   especially fond of writable control stores.

Boy is this long.

-- Jim Mayer					University of Rochester
(arpa) mayer@Rochester.ARPA			Department of Computer Science
(uucp) rochester!mayer				Ray P. Hylan Building
       (via allegra, decvax, or seismo)		Rochester, New York 14627