Path: utzoo!attcan!lsuc!maccs!darel From: darel@maccs.McMaster.CA (Darel Mesher) Newsgroups: comp.sys.ibm.pc Subject: Re: Different types of cache on 386's Message-ID: <1633@maccs.McMaster.CA> Date: 23 Nov 88 19:00:14 GMT References: <673@wa3wbu.UUCP> Reply-To: darel@maccs.UUCP (Darel Mesher) Organization: McMaster U., Hamilton, Ont., Can. Lines: 107 There are _many_ different caching strategies (limited only by designer imagination) however 3 more popular cache organizations are ; Associative Direct Mapped Set Associative An Associative cache uses a content addressable memory (CAM) where a tag (address) is presented to the cache and simultaneously all valid cache tag entries are compared. In the event of a 'hit' the data is returned, a 'miss' generates an external memory fetch (usually a block of data is transfered [eg. 4 words] in burst mode) and the new data and tag displaces a current cache entry. Since the cache uses a CAM, the replacement mechanism can be much more efficient than the Direct Mapped approach. Typical replacement strategies include Least Recently Used (LRU), Random and First In First Out (FIFO). Needless to say, the complexity of a CAM is great, each cell in the cache has to have its own tag comparitor, and for on-chip caches this is a luxury that not often available. This complexity is a trade-off for one of the most efficient caching strategies. Direct Mapped caches are by far the simplest and most compact (in terms of chip real estate) however, with simplicity comes draw- backs. Direct Mapped caches directly map the tag (address) to the corresponding data. There is no replacement algorithm, the tag (address) determines which location in the cache must be used. Hence aliasing is a problem: ,-----. ,-----| = ? |-----> HIT Tag check and data | `-----' retrieval done | ^ simultaneously. | |N-m bits | ,--------. Example: N= 4 and m= 2 bits N-m bits| | TAG | ,-------' | (RAM) | Tag Data Valid N bit/ `--------' ---------------------- ---< ^ addr 00 | | | | addr \_________________| 01 | 10 | data | 1 | m bits | 10 | | | | (LSB) v addr 11 | | | | ,--------. ---------------------- | DATA | | (RAM) | `--------' Therefore addr 1001 would map | to the second entry in the | table and in this case it is data <----' valid (bit = 1) and 'HIT' signalling a cache 'hit'. The problem of aliasing can be demonstrated using the above example. If a loop in the executing code mades repeated references to different addresses with the same Tag field then that cache entry would generate a 'cache miss' each time thru the loop. For example if both address 1101 and 1001 are repeatedly referenced then each reference would generate a cache fault that would force an external memory fetch. In short, this would result in perpetual displacement of the xx01 cache entry while inside the loop. This scenario represents the limitations of Direct Mapped caches. Set Associative can be considered as providing a compromise between Direct Mapped (simple,reasonable performance) and Associative (complex, execellent performance). N-way set Associative caches provide N Direct Mapped caches which are inspected simultaneously. Using the Direct Mapped example from above with a set size of 2 the cache table would look like; Tag Data Valid -------------------------- 00 |.......|...........|......| |_______|___________|______| 01 |..10...|...data....|...1..| |__11___|___data____|___1__| 10 |.......|...........|......| |_______|___________|______| 11 |.......|...........|......| | | | | -------------------------- Therefore, there are now two entries for the 01 tag and as in the example above, references for the two address 1101 and 1001 would both result in cache hits. The hardware is still relatively simple and replacement strategies can be employed for entry displacement. It has been shown that optimum set sizes fall into the 2 to 4 range (since a programmer rarely manipulates more than 4 data structures simultaneously). The 2-way Set Associative approach shows a noticable performance enhancement over the direct mapped approach. It is not uncommon to find CPUs with different size and types of on-board caches. The National Semiconductor NS32532 has a 512 byte Direct Mapped Instruction cache, a 1024 byte 2-Way Set Associative Data cache and an on-board MMU with a 64 entry Associative (CAM) Translation Lookaside Buffer. This one CPU has all three cache types mentioned above! -- Darel Mesher ...!uunet!mnetor!maccs!darel McMaster University darel@maccs.mcmaster.ca