Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!watmath!clyde!rutgers!ucla-cs!sdcrdcf!hplabs!pyramid!prls!mips!mash From: mash@mips.UUCP Newsgroups: comp.sys.m68k Subject: Re: Comparing 68xxx's; really TLB misses Message-ID: <108@winchester.mips.UUCP> Date: Mon, 2-Feb-87 00:01:27 EST Article-I.D.: winchest.108 Posted: Mon Feb 2 00:01:27 1987 Date-Received: Tue, 3-Feb-87 03:19:48 EST References: <809@imagen.UUCP> <561@elmgate.UUCP> <1090@msudoc.UUCP> <1701@hoptoad.uucp> <14561@amdcad.UUCP> Reply-To: mash@winchester.UUCP (John Mashey) Organization: MIPS Computer Systems, Sunnyvale, CA Lines: 44 In article <14561@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes: (regarding 68030) >Any prediction on how fast a TLB miss is handled? I seem to recall the >VAX 780, which does it in hardware, takes about 4 microseconds while >MIPS, which does it in software, takes from 1-2 microseconds for a >micro-TLB miss. I don't know how long a regular TLB miss takes. Many >people are shocked at the idea but looking at the bottom line, "how >long does it take", software TLB refill doesn't seem like such a bad >idea. >Any one know how fast the other chips are at TLB refills? Intel, NSC, >Fairchild, which I assume do it in hardware? 1) A MIPS micro-TLB refill is actually 1 cycle: it's a refill of the tiny on-chip TLB from the 64-entry larger on-chip one. 2) A normal TLB refill is 9-10 instructions (convenient form is slightly different between 4.3 and V.3), + 0-5 cycles for a data-cache miss, + 2-4 cycles of pipeline breakage/time to get into refill routine. This totals 11-19 cycles, assuming NO I-cache misses in the refill routine. The latter cost [on the 5MIPS board/memory design] 5 cycles, so the worst case is about 60 cycles [7.5microsecs]. On the average, the actual cost is 1-2 cycles, yielding 13-21 cycles. Anyway, the bottom line is a little under 2 microseconds total penalty. In any case, for user level programs, this all costs about 1-2% of user execution time, even on fairly large programs, i.e., it's almost down in the noise with regard to performance. I.e., as long as it's fast enough, you can concentrate on making it have the behavior desired by the O.S., and then go worry about other things, like cache design. For example, cache miss overhead is a much larger performance issue: cache misses can easily eat up 10-50% of the cycles, depending on the design and the program. 3) "Many people are shocked at the idea" : I hope this is passing: after all, the same technique is used on HP Spectrums [for sure], and on Celerity boxes [I think]. It does depend on having fast exception handling: if that is not possible, it is probably better to use microcode. 4) Note that Data-cache miss penalities for fetching page-table entries account for 25-30% of the penalty above. This is relevant: in high-performance systems, even if the microcode is instantaneous, you still have 1-2 memory references, which are there whether you do it in hardware or software. -- -john mashey DISCLAIMER: UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086