Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!watmath!clyde!rutgers!ucla-cs!sdcrdcf!hplabs!pyramid!prls!mips!mash
From: mash@mips.UUCP
Newsgroups: comp.sys.m68k
Subject: Re: Comparing 68xxx's; really TLB misses
Message-ID: <108@winchester.mips.UUCP>
Date: Mon, 2-Feb-87 00:01:27 EST
Article-I.D.: winchest.108
Posted: Mon Feb  2 00:01:27 1987
Date-Received: Tue, 3-Feb-87 03:19:48 EST
References: <809@imagen.UUCP> <561@elmgate.UUCP> <1090@msudoc.UUCP> <1701@hoptoad.uucp> <14561@amdcad.UUCP>
Reply-To: mash@winchester.UUCP (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 44

In article <14561@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes:
(regarding 68030)
>Any prediction on how fast a TLB miss is handled? I seem to recall the
>VAX 780, which does it in hardware, takes about 4 microseconds while
>MIPS, which does it in software, takes from 1-2 microseconds for a
>micro-TLB miss. I don't know how long a regular TLB miss takes. Many
>people are shocked at the idea but looking at the bottom line, "how
>long does it take", software TLB refill doesn't seem like such a bad
>idea. 
>Any one know how fast the other chips are at TLB refills? Intel, NSC,
>Fairchild, which I assume do it in hardware?

1) A MIPS micro-TLB refill is actually 1 cycle: it's a refill of the tiny
on-chip TLB from the 64-entry larger on-chip one.

2) A normal TLB refill is 9-10 instructions (convenient form is slightly
different between 4.3 and V.3), + 0-5 cycles for a data-cache miss,
+ 2-4 cycles of pipeline breakage/time to get into refill routine.
This totals 11-19 cycles, assuming NO I-cache misses in the refill routine.
The latter cost [on the 5MIPS board/memory design] 5 cycles, so the worst
case is about 60 cycles [7.5microsecs].  On the average, the actual cost
is 1-2 cycles, yielding 13-21 cycles.  Anyway, the bottom line is a little
under 2 microseconds total penalty.  In any case, for user level programs,
this all costs about 1-2% of user execution time, even on fairly large
programs, i.e., it's almost down in the noise with regard to performance.
I.e., as long as it's fast enough, you can concentrate on making it have
the behavior desired by the O.S., and then go worry about other things,
like cache design.  For example, cache miss overhead is a much larger
performance issue: cache misses can easily eat up 10-50% of the cycles,
depending on the design and the program.

3) "Many people are shocked at the idea" : I hope this is passing: after
all, the same technique is used on HP Spectrums [for sure], and
on Celerity boxes [I think].  It does depend on having fast exception
handling: if that is not possible, it is probably better to use microcode.

4) Note that Data-cache miss penalities for fetching page-table entries
account for 25-30% of the penalty above. This is relevant: in high-performance
systems, even if the microcode is instantaneous, you still have 1-2
memory references, which are there whether you do it in hardware or software.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086