Memory-Level Parallelism (MLP)

C H A P T E R 5

Richer Considerations

Until now we have focused narrowly on the problem of cache replacement, both in terms of

metrics—in which we have focused on cache misses—and in terms of context—in which we have

focused on the cache as an isolated abstract entity. But, of course, cache misses do not translate

directly to performance loss, and cache replacement policies do not operate in isolation. is

chapter broadens our discussion in both dimensions, ﬁrst exploring replacement policies that

consider the variable cost of cache misses and then exploring policies that consider the impact

of prefetchers, the impact of the cache architecture, and ﬁnally the impact of new memory

technologies.

5.1 COST-AWARE CACHE REPLACEMENT

Most replacement policies seek to minimize overall cache miss rate, assuming that all cache

misses are equally costly. In reality, diﬀerent cache misses can have widely diﬀerent impact on

overall performance. For example, misses that are isolated (low memory-level parallelism) tend

to be more expensive than misses that are clustered (high memory-level parallelism), because

with greater parallelism there is more ability to hide the latency of a single cache miss. As an-

other example, misses that are on the program’s critical path are more important to program

performance than those that are not. A smart replacement policy that considers these costs can

preferentially cache high-cost misses (at the cost of a lower hit rate) to achieve better program

performance.

e optimal cost-aware replacement policy (CSOPT) [Jeong and Dubois, 2006] mini-

mizes the overall cost of all misses, rather than minimizing the number of misses. e basic

idea is to follow Belady’s MIN policy except when MIN would evict a block whose miss cost is

greater than those of other blocks in the cache. On encountering such a block, CSOPT explores

multiple sequences of replacement decisions until it can decide on the replacement sequence that

minimizes total cost. A naive realization of CSOPT expands all possible replacement sequences

in the form of a search tree (see Figure 5.1); the depth of the search tree is equal to the number

of cache accesses, because a new level is added for every new cache access, and the width is equal

to the cache size s, because the search considers up to s replacement choices on every cache

eviction.

Note that a search tree is required because greedily caching a high-cost block at any node

in the tree might not translate to the best replacement sequence in the long run. In particular,

greedily caching a high-cost block might preclude the caching of another block that has an even

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Memory-Level Parallelism (MLP)