Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!ucbvax!ucbarpa.Berkeley.EDU!melvin
From: melvin@ucbarpa.Berkeley.EDU (Steve Melvin)
Newsgroups: comp.arch
Subject: Re: Context switching on RISC chips
Summary: Scheduling Basic Blocks and Variability in Memory Latency
Keywords: Context Switching, Interrupts, MIMD Architecture, HEP
Message-ID: <33452@ucbvax.BERKELEY.EDU>
Date: 3 Jan 90 10:34:11 GMT
References: <3167@iitmax.IIT.EDU> <14007@pur-ee.UUCP>
Sender: usenet@ucbvax.BERKELEY.EDU
Reply-To: melvin@ucbarpa.Berkeley.EDU.UUCP (Steve Melvin)
Organization: University of California, Berkeley
Lines: 56

In article <14007@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes:
>First of all, it isn't just the register file which has gotten big -- it's
>the complete localized process state.  This includes registers, caches, even
>process-specific page tables and disk buffers.  Second, it has NOTHING TO DO
>WITH BEING RISC -- chips are fast, talking with other chips is slow, talking
>with other boards is even slower, so ANY high-performance architecture
>naturally tends toward a larger, longer lived, localized process state.
>

Your point that the localized process state includes more than just registers
is well taken, but I'd like to take issue with your second statement.  There
are indeed architectural features which affect the size of the process state,
independent from the speed differential between the processor and the rest
of the world.  The key is that temporary results which are computed and used
within a basic block need not be stored in "named" registers.  That is, they
may be internal registers but NOT part of the process state if, for example,
context switches are never initiated except on basic block boundaries
(interrupts can either have long latencies or work can be discarded).

Of course, how much the state can be reduced while preserving the advantages
of the same number of internal registers depends on the application and the
compiler.  The point is that by increasing the size of the unit of work which
the processor considers "atomic" (I call this an Execution Atomic Unit),
fewer *architectural* registers are required.  The main reason one would want
to do this is to allow more parallelism to be exploited, but that's
another issue.

>
>BTW, you might say that processes can require context switches for
>synchronous events (e.g., loading a value from memory which is far away),
>but IMHO the use of a context switch is usually overkill in such cases
>(sorry, Burton ;-).  This is because, with the right architecture,
>synchronous delay events can be hidden using static (compile-time)
>scheduling (e.g., code motions to hide delayed loads).
>

Well, first of all, the use of the term "context switch" is a little
misleading as applied to the HEP-1.  That machine allows the context for
up to 50 processes to simultaneously be present in the processor, eight
of which are at any time actively in the pipeline.  When a process 
needs to wait for memory, it is simply not returned to the group (of 50)
which are candidates for being scheduled.

Secondly, I disagree that with the "right architecture" synchronous delay
events can be hidden.  If there is no variance in memory latency (i.e. if
you have no cache or if you have a 100% hit rate), then the compiler can
do a pretty good job (branch predictability is also important here, static
vs. dynamic but that's another issue).  However, when memory latency becomes
more variable, dynamic scheduling starts becoming more important (also,
this effect is increasingly important as the number of parallel operations
increases).

-------
Steve Melvin
University of California, Berkeley
-------