Path: utzoo!attcan!uunet!seismo!sundc!pitstop!sun!aeras!elxsi!beatnix!robert
From: robert@beatnix.UUCP (Robert Olson)
Newsgroups: comp.arch
Subject: Re: Are all RISCs the same?
Message-ID: <903@elxsi.UUCP>
Date: 9 Sep 88 16:30:27 GMT
References: <58@zeno.MN.ORG> <6903@aw.sei.cmu.edu> <22860@amdcad.AMD.COM> <6930@aw.sei.cmu.edu>
Sender: news@elxsi.UUCP
Reply-To: robert@beatnix.UUCP (Robert Olson)
Organization: ELXSI Super Computers, San Jose
Lines: 117

ELXSI sells a high end multiprocessor into the realtime marketplace.  By high
end I mean VAX MIPS performance from 7 MIPS to 250 MIPS, up to 2 GB memory
and so forth.  By realtime I mean event driven, with frame times of perhaps
as little as 250 microseconds, although most customers are running frame times
of 5 milliseconds to 20 milliseconds.  Many of the issues you raise in your
note are ones which we encounter with our customers.

In article <6930@aw.sei.cmu.edu> firth@bd.sei.cmu.edu (Robert Firth) writes:
>In article <6903@aw.sei.cmu.edu> firth@bd.sei.cmu.edu I wrote:
>
> (a) Some have register window systems.  This is a disastrous design
>     error that will ultimately doom them.  In particular, the greatly
>     increased context-switch time, and the unpredictability in the
>     cost of a simple procedure call, make register-window machines
>     unsuitable for hard real time applications.
Predictability of response times (jitter) is crucial for most of the 
applications we run.  In general the computer is running some mathematical
approximation of the real world.  The application developers generally  
make their codes consume 90% - 95% of the cycles in the frame.  Jitter must
be taken out of the cycles available to the application.  Hence, in realtime
design you assume the worst case jitter, even if it only happens once an
hour or so.  Those (mostly) wasted cycles give the application developer
heartburn.

>
>In article <22860@amdcad.AMD.COM> tim@delirun.amd.com (Tim Olson) writes:
>
>  Oh, I suppose that by the same reasoning, any machine with caches,
>  virtual memory, or even "page-mode" RAMs is also doomed.  Sigh.  I guess
>  it's back to the old TMS9900 architecture with no registers to get in
>  the way of that fast context switch and predictability.  ;-)
>
>Yes, machines with caches do indeed cause problems in implementing hard
>real time systems; this was brought out in some of the reports of the
>MIPS assessment funded by RADC.  Virtual memory is hardly an issue, since
>the majority of real time systems do not use it (wisely, in my view).

In the ELXSI architecture there is only virtual memory, in the sense that 
the instruction set only allows memory references relative to your process'
page map.  We do allow you to freeze down pieces of your address space in
main memory and, for that matter, in the cache.  The cache on the 6460
CPU is 1 MB and can be partitioned among several processes in a static
fashion, although the default is for all processes to share the cache.
While some of our crustier users find virtual memory concepts disturbingly
avant-garde, the ability to freeze things in the cache and main memory
makes them feel better.  There are substantial advantages to the protection
from unplanned "interprocess communication" (i.e., wild writes into 
unintentionally shared memory).  I speak for the company when I say that our
customers do very time critical applications while using virtual memory.  Like
any tool, you need to understand the implications of using it and the ways to
overcome the negative side effects for your application.

>
>The TI9900 is indeed an example worth studying.  It had a context switch
>time of less than 10usec using early 1970s technology.  Last month I
>attended a presentation of a new "RISC" machine with a 20 MHz clock that
>couldn't do half as well.

On the 6460, the context switch time is about 3 microseconds.  Total response
time to an external interrupt, including a context switch, is about 10 
microseconds.  If you mutter the right incantations, that can be guaranteed
response time, even with timesharing going on in other CPUs.  One of the 
secrets (actually, not so secret) is the use of sixteen process context
register sets on the CPU.  There is a simple strict priority driven scheduler
to manage those register sets, unconditionally running the highest priority
task.  Context switch involves running the scheduler, settling the state of
the CPU from the current process, and selecting the other set of registers.
Needless to say, we are pretty proud of these numbers in a large scale system.

>
>Tim continues:
>
>  How did you measure this "greatly increased context switch time?" There
>  is typically a whole lot more going on during a true context switch than
>  dumping and restoring register contents.  In addition, many times it is
>  interrupt latency, not context switch time, that is important.  Here,
>  many "register window RISCS" like the Am29000, SPARC, and 80960 have an
>  advantage, in that typically there is a window or reserved register area
>  for the interrupt handler to run in without saving *any* registers. 
>
Virtually all of our customers run multiprocess simulations.  Many of them are
doing flight simulators.  One development team will simulate the engines, one
group will interface to the cockpit controls, one group will simulate the 
flight computer(s), and so forth.  Sometimes the black boxes are real ones, 
hooked up over 1553 or similar external busses, sometimes they are software 
simulations.  Efficient context switch is essential to their application.
Every cycle counts, and we look for ways to avoid saving anything that doesn't
absolutely need saving.

>
>And in response:
>
>There is NOT a whole lot more going on during a context switch than the
>register save and restore.  Setting up the dynamic environment for a
>high-level language task normally implies just changing the registers
>and restoring any condition codes.  A few machines really blow it by
>having a lot of FPU state (eg the MC68000) or by requiring tasks to
>use different memory maps (1750a), but on clean machines the major part
>of the work is the save and restore of the on-chip registers.  The more
>there are, the longer this takes.
>

I agree with this statement.  (Incidentally, we do not have condition codes,
although there is a status word to be saved.)


It is possible for realtime users to have both a modern computer and get their
job done.  We offer access to realtime from Unix, we support virtual memory,
the operating system is message driven rather than shared memory, people 
program in Pascal, Fortran, C and Ada and so forth.  What you have to do
is give the realtime user the ability to guarantee certain attributes of his
environment, such as memory access times, device access times and so forth.
While there are things we still have to do to improve our abilities this 
way, I think the number of successful applications which have been built using
our equipment is proof that important, demanding applications can take 
advantage of many of the advances readers of this group have developed in the
last decade.