Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!princeton!caip!sri-spam!mordor!lll-crg!seismo!rochester!ur-tut!tuba From: tuba@ur-tut.UUCP (Jon Krueger) Newsgroups: net.arch,net.micro.att Subject: Re: AT&T MIPS claim [really task-switching] Message-ID: <415@ur-tut.UUCP> Date: Mon, 16-Jun-86 15:46:56 EDT Article-I.D.: ur-tut.415 Posted: Mon Jun 16 15:46:56 1986 Date-Received: Wed, 18-Jun-86 03:26:06 EDT References: <577@scirtp.UUCP> <124@bakerst.UUCP> <583@scirtp.UUCP> <585@scirtp.UUCP> <206@njitcccc.UUCP> <4138@sun.uucp> <506@mips.UUCP> Reply-To: tuba@ur-tut.UUCP (Jon Krueger) Distribution: net Organization: Univ. of Rochester Computing Center Lines: 67 Xref: watmath net.arch:3469 net.micro.att:1297 In article <506@mips.UUCP> mash@mips.UUCP (John Mashey) writes: > . . . >3) Let's try some back-of-the-envelope numbers: > a) At 60 cs/second (typical) and 700 usec/cs, the VAX would spend > 60*700 = 42,000 usecs, or about 4.2% of the time doing conxtext > switches. > b) Supposing that that 10% of this time is actually in save/restore, > about .4% of the machine might be spent in save/restore > (SVPCTX/LDPCTX). Of course, they might be used for other things also. >4) Now, let's try published data: Clark & Levy, "Measurement and Analysis of > Instruction Use in the VAX 11/780", 9th Ann. Symp. on Comp. Arch, > April 1982. > a) LDPCTX and SVPCTX aren't on the top 25 in usage of CPU time, > even in VMS Kernel mode. The top 25 instructions use 62% of the > total kernel time, and the smallest shown is REMQUE with 1.31%. > This was for multi-user workloads. > b) MTPR (Move to Processor Register) used 5.27% of the kernel time, > and 1.15% of the total CPU time for all processor modes. From this, > I infer that the kernel was using 21% of the CPU (1.15/5.27). > Hence, the most time-consuming of LDPCTX/SVPCTX could be consuming > no more than 1.31% of the kernel, or .27% of the total CPU. Even > both together could account for no more than .54% of the total CPU. >5) All of this is consistent in bounding the problem: for time-sharing >systems like VAXen, the special context save/restore instructions contribute >at most half a percent to performance. . . . Thanks for the numbers and calculations. I can't argue with your numbers, but I arrive at different conclusions. I agree that the VAX architecture, as implemented on the 780, including the presence and performance of those instructions, limits overhead due to context switching to about 5 percent of processor time. So the performance increase attainable by decreasing this overhead is only 5 percent. The numbers you present don't tells us how much of that 5 percent is spent actually executing LDPCTX/SVPCTX. So we can only estimate the performance aspects of increasing their speed. I accept your estimate of at most half a percent processor time spent, so we can only save about half a percent. What we can't say is how much context switching overhead would rise to if the instructions didn't exist. For instance, if the functionality implemented in the microcode of LDPCTX/SVPCTX were performed by a system routine, overhead might be 90% of processor time at 60 switches per second. In this case, we could say that the instructions contribute about 85% to system performance. Similarly, if hardware on the 780 autosaved and restored registers as needed by processor modes and subroutine instructions, overhead might be 0% of processor time, but cycles would take longer. In other words, I think the numbers you present prove that only about half a percent performance increase can be attained by tweaking the special instructions. They don't prove that the special instructions contribute only 10% to context switching or only half a percent to system performance related to context switching. Suppose 50 percent of system time was spent executing them. Would you conclude that they contribute 50 percent to performance? I would conclude that they subtract 50 percent from performance. In other other words, you look at measurements of context switching on 780's and since the special instructions represent so little processor time, you conclude they don't contribute much to performance. I wonder how much more processor time would be spent acheiving the same functionality in different ways if the instructions didn't exist and didn't execute at their measured speeds. I conclude that we don't know enough to assess the contribution of the special instructions to a 780's ability to keep context switching overhead down to about 5 percent. Therefore, we don't know how important the special instructions are to timesharing, or how clever it is to put them into your architecture.