Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!allegra!princeton!caip!sri-spam!mordor!lll-crg!seismo!rochester!ur-tut!tuba
From: tuba@ur-tut.UUCP (Jon Krueger)
Newsgroups: net.arch,net.micro.att
Subject: Re: AT&T MIPS claim [really task-switching]
Message-ID: <415@ur-tut.UUCP>
Date: Mon, 16-Jun-86 15:46:56 EDT
Article-I.D.: ur-tut.415
Posted: Mon Jun 16 15:46:56 1986
Date-Received: Wed, 18-Jun-86 03:26:06 EDT
References: <577@scirtp.UUCP> <124@bakerst.UUCP> <583@scirtp.UUCP> <585@scirtp.UUCP> <206@njitcccc.UUCP> <4138@sun.uucp> <506@mips.UUCP>
Reply-To: tuba@ur-tut.UUCP (Jon Krueger)
Distribution: net
Organization: Univ. of Rochester Computing Center
Lines: 67
Xref: watmath net.arch:3469 net.micro.att:1297

In article <506@mips.UUCP> mash@mips.UUCP (John Mashey) writes:
> . . .
>3) Let's try some back-of-the-envelope numbers:
>	a) At 60 cs/second (typical) and 700 usec/cs, the VAX would spend
>	60*700 = 42,000 usecs, or about 4.2% of the time doing conxtext
>	switches.
>	b) Supposing that that 10% of this time is actually in save/restore, 
>	about .4% of the machine might be spent in save/restore
>	(SVPCTX/LDPCTX).  Of course, they might be used for other things also.
>4) Now, let's try published data: Clark & Levy, "Measurement and Analysis of
>	Instruction Use in the VAX 11/780", 9th Ann. Symp. on Comp. Arch,
>	April 1982.
>	a) LDPCTX and SVPCTX aren't on the top 25 in usage of CPU time,
>	even in VMS Kernel mode. The top 25 instructions use 62% of the
>	total kernel time, and the smallest shown is REMQUE with 1.31%.
>	This was for multi-user workloads.
>	b) MTPR (Move to Processor Register) used 5.27% of the kernel time,
>	and 1.15% of the total CPU time for all processor modes.  From this,
>	I infer that the kernel was using 21% of the CPU (1.15/5.27).
>	Hence, the most time-consuming of LDPCTX/SVPCTX could be consuming
>	no more than 1.31% of the kernel, or .27% of the total CPU.  Even
>	both together could account for no more than .54% of the total CPU.
>5) All of this is consistent in bounding the problem: for time-sharing
>systems like VAXen, the special context save/restore instructions contribute
>at most half a percent to performance. . . .

Thanks for the numbers and calculations.  I can't argue with your numbers,
but I arrive at different conclusions.

I agree that the VAX architecture, as implemented on the 780, including the
presence and performance of those instructions, limits overhead due to
context switching to about 5 percent of processor time.  So the performance
increase attainable by decreasing this overhead is only 5 percent.  The
numbers you present don't tells us how much of that 5 percent is spent
actually executing LDPCTX/SVPCTX.  So we can only estimate the performance
aspects of increasing their speed.  I accept your estimate of at most half a
percent processor time spent, so we can only save about half a percent.

What we can't say is how much context switching overhead would rise to if
the instructions didn't exist.  For instance, if the functionality
implemented in the microcode of LDPCTX/SVPCTX were performed by a system
routine, overhead might be 90% of processor time at 60 switches per second.
In this case, we could say that the instructions contribute about 85% to
system performance.  Similarly, if hardware on the 780 autosaved
and restored registers as needed by processor modes and subroutine
instructions, overhead might be 0% of processor time, but cycles
would take longer.

In other words, I think the numbers you present prove that only about half a
percent performance increase can be attained by tweaking the special
instructions.  They don't prove that the special instructions contribute
only 10% to context switching or only half a percent to system performance
related to context switching.  Suppose 50 percent of system time was spent
executing them.  Would you conclude that they contribute 50 percent to
performance?  I would conclude that they subtract 50 percent from
performance.

In other other words, you look at measurements of context switching on 780's
and since the special instructions represent so little processor time, you
conclude they don't contribute much to performance.  I wonder how much more
processor time would be spent acheiving the same functionality in different
ways if the instructions didn't exist and didn't execute at their measured
speeds.  I conclude that we don't know enough to assess the contribution of
the special instructions to a 780's ability to keep context switching
overhead down to about 5 percent.  Therefore, we don't know how important
the special instructions are to timesharing, or how clever it is to put them
into your architecture.