5.2. CRITICALITY-DRIVEN CACHE OPTIMIZATIONS 43
high-cost, depending on whether the number of issued instructions is above or below a thresh-
old. For each block, LACS maintains a 2-bit cost value so that they can represent both high and
low cost with two levels of confidence. us, instead of defining a numeric cost value, LACS
uses a binary cost classification.
For its replacement policy, LACS attempts to reserve high-cost blocks in the cache, but
only while their locality is still high (i.e., they have been accessed recently). In particular, on
a cache miss, LACS chooses a low-cost block to evict, so that high-cost blocks remain in the
cache. However, high-cost blocks are aged by decrementing their 2-bit cost value so that they
relinquish the reservation if they’ve been in the cache for too long. Similarly, low-cost blocks
are promoted by incrementing their 2-bit cost value so that they are retained in the cache if they
show high temporal locality. us, LACS combines the benefits of both locality and cost-based
replacement.
5.2 CRITICALITY-DRIVEN CACHE OPTIMIZATIONS
Criticality is a more general cost function than MLP: Srinivasan et al. [2001] define a critical
load as any load that needs to complete early to prevent processor stalls, while a non-critical
load is one that can tolerate long latency. Critical loads are identified using a variety of tech-
niques [Fields et al., 2001, Srinivasan and Lebeck, 1998], and criticality-driven cache optimiza-
tions prioritize critical loads over non-critical loads. To highlight important advances in this
area, we now discuss two criticality-driven optimizations.
5.2.1 CRITICAL CACHE
Using detailed limit studies, Srinivasan and Lebeck [1998] determine that load criticality can
be determined by the characteristics of the chain of instructions dependent on the load. In
particular, a load is classified as critical if it satisfies any of the following criteria: (1) the load
feeds into a mispredicted branch, (2) the load feeds into another load that incurs an L1 cache
miss, or (3) the number of independent instructions issued in an N cycle window following
the load is below some threshold. us, this definition of criticality considers both the type of
dependent instructions (e.g., mispredicted branch, L1 misses) and the number of instructions
in its dependence chain.
To illustrate the importance of optimizing for critical loads, Srinivasan and Lebeck [1998]
show that if all critical loads could be satisfied by the L1 cache, the result would be an average
40% improvement over a traditional memory hierarchy, whereas if an equal percentage of loads
is randomly chosen to hit in the L1, the average improvement would be only 12%. erefore, it
may be possible to improve overall performance by decreasing the latency of these critical loads
at the expense of increased latency for non-critical loads.
To optimize critical loads, Srinivasan and Lebeck [1998] use a critical cache, which serves
as a victim cache for blocks that were touched by a critical load. For a fair comparison, the
baseline also includes a victim cache, called the locality cache, which caches both critical and
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset