Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ll-xn!ames!amdcad!bcase
From: bcase@amdcad.AMD.COM (Brian Case)
Newsgroups: comp.arch
Subject: Re: Anyone for memory management on the AM29000?
Message-ID: <16336@amdcad.AMD.COM>
Date: Thu, 23-Apr-87 11:49:10 EST
Article-I.D.: amdcad.16336
Posted: Thu Apr 23 11:49:10 1987
Date-Received: Sat, 25-Apr-87 10:07:45 EST
References: <67@bernina.UUCP>
Reply-To: bcase@amdcad.UUCP (Brian Case)
Distribution: world
Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca.
Lines: 66
Keywords: AM29000, memory management, TBL, flame

In article <67@bernina.UUCP> tve@ethz.UUCP (Th. von Eicken) writes:
>When reading the data sheet I noticed that the TLB entries
>don not have any "page used" flag nor any "page modified"
>flag. Does that mean that the AM29000 memory managenent is even
>more crippled than on a VAX (which doesn't have a "page used" flag???
>
>On TLB misses, as far as I understand, a software trap is generated.
>Are there any figures on typical interrupt routine times for handling
>the misses? What is the performance penalty, compared to miss
>handling in hardware?

Yeah, questions about the "missing" page referenced and modified bits
in the TLB are always among the first to be asked when people are
presented with the Am29000.  The deal is:  these bits don't belong in
TLB entries, they belong either in the page tables themselves or in
the physical page map (note that for inverted page tables, these
structures are (or can be) the same thing).  The VAX is brain-damaged
because the TLB reload is done by hardware (well, microcode) and it
forgets to take note of some of the information that OS guys would
like to have.  Since the Am29000 TLB reload is done by a software
routine, you not only can decide what the page tables look like, but
you can also decide whether or not to gather referenced and modified
information.

Note that referenced information is available degenerately by the
very fact that that TLB entry is present at all (the fact that the
TLB entry was fetched from the page table means that the page has
been referenced).  Page modified can be gathered in software too, if
you are willing to take the performance hit:  put the TLB entry for
the page into the TLB but set the write-protection bit(s) (one for
supervisor one for user); then, when a write to the page is attempted,
a protection violation trap will be taken; at this point, look in the
page table to make sure that the page is suposed to be read-only; if
not, then change the TLB entry to allow writing and count a page
modification in the page table (or physical page map).

But this is not the right way to do it anyway.  The right way is to
have a small RAM-based table in the memory controller keep track of
page modification:  there is very little overhead and the information
is maintained on a per-physical-page basis, just as it should be.
Also, it is probably the best way for multiprocessor systems.

I have written a paper about TLB reload for the Am29000, complete
with page table structures and code examples for two-level and inverted-
page tables.  There is also a discussion of TLB miss processing overhead
for a few of our benchmark programs (nroff, our assembler, puzzle,
etc.).  The overhead, in added cycles per instruction, is typically
less than 0.01 with the max (for the examples given) at 0.27 for the
"rm" command (this attrociously high number is due to the fact that
rm is a very short program so the cold-start penalty is a high
percentage of the total time).  The TLB miss ratios go from 1.50%
(yeech) for rm to < 0.01% for puzzle.  Four of the six programs have
TLB miss ratios < 0.05%, the next highest is 0.12% (nroff), and then
is rm at 1.50%.  Note that the only instructions in the Am29000 set
that can cause a TLB miss are jumps, calls, loads, and stores (well,
there can also be a TLB miss "caused" by the other instructions
when a page boundary is crossed, but the frequency of this event is
extremely low).  For the routines I wrote, the two-level TLB miss
handler takes 42 cycles while the inverted-page miss handler takes
63 cycles on average (both include all overhead and assume single-
cycle burst, two-cycle first access memories).

I can send copies of the paper to those interested.... (It's text
and graphics, so I can't just post.)

    bcase