Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!ll-xn!ames!amdcad!bcase From: bcase@amdcad.AMD.COM (Brian Case) Newsgroups: comp.arch Subject: Re: Anyone for memory management on the AM29000? Message-ID: <16336@amdcad.AMD.COM> Date: Thu, 23-Apr-87 11:49:10 EST Article-I.D.: amdcad.16336 Posted: Thu Apr 23 11:49:10 1987 Date-Received: Sat, 25-Apr-87 10:07:45 EST References: <67@bernina.UUCP> Reply-To: bcase@amdcad.UUCP (Brian Case) Distribution: world Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca. Lines: 66 Keywords: AM29000, memory management, TBL, flame In article <67@bernina.UUCP> tve@ethz.UUCP (Th. von Eicken) writes: >When reading the data sheet I noticed that the TLB entries >don not have any "page used" flag nor any "page modified" >flag. Does that mean that the AM29000 memory managenent is even >more crippled than on a VAX (which doesn't have a "page used" flag??? > >On TLB misses, as far as I understand, a software trap is generated. >Are there any figures on typical interrupt routine times for handling >the misses? What is the performance penalty, compared to miss >handling in hardware? Yeah, questions about the "missing" page referenced and modified bits in the TLB are always among the first to be asked when people are presented with the Am29000. The deal is: these bits don't belong in TLB entries, they belong either in the page tables themselves or in the physical page map (note that for inverted page tables, these structures are (or can be) the same thing). The VAX is brain-damaged because the TLB reload is done by hardware (well, microcode) and it forgets to take note of some of the information that OS guys would like to have. Since the Am29000 TLB reload is done by a software routine, you not only can decide what the page tables look like, but you can also decide whether or not to gather referenced and modified information. Note that referenced information is available degenerately by the very fact that that TLB entry is present at all (the fact that the TLB entry was fetched from the page table means that the page has been referenced). Page modified can be gathered in software too, if you are willing to take the performance hit: put the TLB entry for the page into the TLB but set the write-protection bit(s) (one for supervisor one for user); then, when a write to the page is attempted, a protection violation trap will be taken; at this point, look in the page table to make sure that the page is suposed to be read-only; if not, then change the TLB entry to allow writing and count a page modification in the page table (or physical page map). But this is not the right way to do it anyway. The right way is to have a small RAM-based table in the memory controller keep track of page modification: there is very little overhead and the information is maintained on a per-physical-page basis, just as it should be. Also, it is probably the best way for multiprocessor systems. I have written a paper about TLB reload for the Am29000, complete with page table structures and code examples for two-level and inverted- page tables. There is also a discussion of TLB miss processing overhead for a few of our benchmark programs (nroff, our assembler, puzzle, etc.). The overhead, in added cycles per instruction, is typically less than 0.01 with the max (for the examples given) at 0.27 for the "rm" command (this attrociously high number is due to the fact that rm is a very short program so the cold-start penalty is a high percentage of the total time). The TLB miss ratios go from 1.50% (yeech) for rm to < 0.01% for puzzle. Four of the six programs have TLB miss ratios < 0.05%, the next highest is 0.12% (nroff), and then is rm at 1.50%. Note that the only instructions in the Am29000 set that can cause a TLB miss are jumps, calls, loads, and stores (well, there can also be a TLB miss "caused" by the other instructions when a page boundary is crossed, but the frequency of this event is extremely low). For the routines I wrote, the two-level TLB miss handler takes 42 cycles while the inverted-page miss handler takes 63 cycles on average (both include all overhead and assume single- cycle burst, two-cycle first access memories). I can send copies of the paper to those interested.... (It's text and graphics, so I can't just post.) bcase