Path: utzoo!mnetor!uunet!lll-winken!lll-tis!ames!necntc!linus!alliant!jeff From: jeff@Alliant.COM (Jeff Collins) Newsgroups: comp.arch Subject: Re: hardware support of reference and change bits Message-ID: <1647@alliant.Alliant.COM> Date: 22 Apr 88 21:18:23 GMT References: <1458@hubcap.UUCP> Reply-To: jeff@alliant.UUCP (Jeff Collins) Organization: Alliant Computer Systems, Littleton, MA Lines: 89 Keywords: page replacement reference bit change bit In article <1458@hubcap.UUCP> mark@hubcap.UUCP (Mark Smotherman) writes: > Removed a discussion about IBM reference and dirty bits. > >I don't have any hardware manuals available for the 386 or 32082 that give >full descriptions, but I assume they work in the following way. > >1. Given the absence of the page table entry from the TLB: > * Upon a reference, the page table entry is brought into the TLB and > the reference bit is inspected. If the reference bit is zero, then > it is set to one. This update of the entry must be done in a store- > through manner. That is, not only the should the TLB copy of the page > table entry be updated, but the copy of the entry in the cache should > also be updated. (Of course, a designer could eliminate the inspection > of the reference bit during a critical path by performing the store- > through each time.) An additional store-through to main memory and > cache invalidate signal would be needed in a multiprocessor. There > would not be a need for a TLB invalidate signal. > * Upon a change, the page table entry is processed as above, only with > both bits set to one. The condition causing memory traffic is if > either bit is zero (00, 01, or 10). On a multiprocessor the decision to write the PTE back to main memory or not is determined by the cache protocol. If it is write-through, then yes, the PTE must be written back to memory. If the cache is write-back, then it may not be written back to memory. When the hardware sets a reference and/or modified bit in the TLB, the operating system does not know that the bit is being set, it is automatic. Given that the software does not know that the bit is set, there is no way to tell the other processors to perform an invalidate. Instead there are two ways to solve this race condition. One is to not share PTEs. This means that each process has private copies of the hardware page tables. When an update is made by hardware to the TLB, no one else cares because no one else could have the PTE cached in the TLB (this assumes the TLB is flushed on context switch). If the operating system allows shared PTEs (this would be done to allow multiple processes to share memory), then the problem can be effectively ignored. With reference bits it is not very important if they become inconsistent. It only means that you lose a little accuracy on your working set calculations. With modified bits it is very important to keep them consistent, or to not care what they are. This can be done by always assuming that shared data is share, or never releasing it - either solution works. > >2. Given that the entry is currently in the TLB: Eliminated this text as I had nothing to add. > >3. Immediately after an instruction changes a page table entry (e.g. reset >the reference or change bits), the TLB must be purged. For multiprocessors >the cache must also be purged (or the change must have been a store-through) >and invalidate signals sent to the other processors to purge their TLBs and >caches. This is close. If the operating system clears a referenced or modified bit on a shared PTE, then it must purge it's TLB and cause all of the other processors that could have the PTE cached in the TLB to purge. Again note this is only a problem with shared PTEs. The cache does not need to be purged. When the operating system writes the PTE entry, it writes to the cache/memory system. The cache will contain the correct version after the reset, the TLB contains an old version - which is why the TLB entry must be purged. (by the way most of the MMUs allow a single entry to be purged, instead of the whole TLB) > >For those who know, is this truly how things work? Do you have any idea >(or better yet, any measurements) of the amount of memory traffic involved >in the setting of the reference and change bits? Can I/O processors (DMA >or whatever) on these micros affect these bits? > The setting/clearing of the referenced and modified bits are not a big deal (ie. they don't cause a lot of bus traffic). This is because it will only cause traffic the first time it is changed, and that is a very small percentage of the overall number of processor reads and writes. To re-emphasize the multiprocessor issues here - the only trouble is with shared user level PTEs. Note that shared pages do not necessarily imply shared PTEs. It is possible to build virtual memory systems that have shared pages and private PTEs - this is what Mach and Encore (Umax 4.2) do. This saves the invalidates and the consistency problems. I/O processors do not use these bits (they make physical accesses).