Path: utzoo!attcan!uunet!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!mips!winchester!mash From: mash@mips.COM (John Mashey) Newsgroups: comp.arch Subject: Re: Paging page tables Message-ID: <39823@mips.mips.COM> Date: 30 Jun 90 20:41:28 GMT References: <3300142@m.cs.uiuc.edu> <9758@pt.cs.cmu.edu> Sender: news@mips.COM Reply-To: mash@mips.COM (John Mashey) Organization: MIPS Computer Systems, Inc. Lines: 66 In article aglew@bach.crhc.uiuc.edu (Andy Glew) writes: >>The chip designer does control the cost of a simple TLB refill. For >>example, the 88000 and 68020 (well, 68851) have table-walk hardware, >>whereas the R3000 designers punted TLB refill to an interrupt handler >>and some special instructions. 0) There is a separate exception vector for TLB miss. 1) TLB refill routines are typically 10-15 instructions, depending on which TLB refill routine you use (they're sometiems different for different OSs to match different ideas of PTE arrangements, 2) The usual refill uses the "TLB write random" instruction, which writes a TLB entry into a pseudo-random location in the top 56 entries of the 64-entry TLB. This instruction does get used elsewher; the rest of the instructions in the sequences are nothing special, just a sequnece of user-level instructions plus moves to/from coprocessor 0 (the system coprocessor). 3) Both HP PA and AMD 29K use software-refilled TLBs, although the details are rather different. >First off, the hardware complexity/performance tradeoffs involved in >the MIPS software TLB miss handling decision were described in a paper >somewhere, and seemed reasonable. It could go either way. Proc. 1986 IEEE CompCon, SanFrancisco, March 1986. Paper by DeMoney, Moore, and Mashey. > Of course if the TLB miss handling is comparable to a syscall in >slowness you haven't gained much - but isn't MIPS' point that the SW >TLB miss handler is *not* as slow as a syscall? yes, not as slow, which is why it has own exception vector. > NB. I do not propose that the TLB miss handler be purely user >code. Just that it interact with some (untrusted) user accessible >data structures. This is quite feasible, although I don't know anyone who's done it. As usual, whether you do this or not depends on: a) The problem domain you wish to cover with the chip. b) The cost/complexity of TLBmiss handling as it integrates with the pipeline. This is especially "interesting" as pipelines get more complex. c) The cost of the silicon space for the MMU and its control. d) The cycle count cost of doing software refill versus hardware. e) How much cycle count degradation actually results from TLB processing, versus cycle-time degradation (if any) from various approaches. Anyone who does serious analysis of all this finds that TLB processing is on the order of 1%, except for big array processing, or certain other cases that have awful locality, in which case TLBs (either hardware or software) suffer. Note that it is very easy to get surprised in all this. For example, some chips cannot do PTE table-walking inside their caches, especially in multi-pocessing environments, and it is quite possible that regardless of hardware of software refill, the time may well be dominated by the number of uncached accesses to main memory. In the usual marketing wars, sometimes people claim "hardware TLB refill is faster than software refill." This may be true, or it may not be, in SPECIFIC comparisons. However, anyone making the claim IN GENERAL is almost certainly a) A marketeer extolling the virtues of a product that has hardware-refill, OR b) Someone unlikely to be able answer the following questions: "It's faster?" Compared to which chips? How much faster is it? What percentage of total CPU time is that? -- -john mashey DISCLAIMER: UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086