Criticality-Aware Multi-Level Cache Hierarchy

5.2. CRITICALITY-DRIVEN CACHE OPTIMIZATIONS 43

high-cost, depending on whether the number of issued instructions is above or below a thresh-

old. For each block, LACS maintains a 2-bit cost value so that they can represent both high and

low cost with two levels of conﬁdence. us, instead of deﬁning a numeric cost value, LACS

uses a binary cost classiﬁcation.

For its replacement policy, LACS attempts to reserve high-cost blocks in the cache, but

only while their locality is still high (i.e., they have been accessed recently). In particular, on

a cache miss, LACS chooses a low-cost block to evict, so that high-cost blocks remain in the

cache. However, high-cost blocks are aged by decrementing their 2-bit cost value so that they

relinquish the reservation if they’ve been in the cache for too long. Similarly, low-cost blocks

are promoted by incrementing their 2-bit cost value so that they are retained in the cache if they

show high temporal locality. us, LACS combines the beneﬁts of both locality and cost-based

replacement.

5.2 CRITICALITY-DRIVEN CACHE OPTIMIZATIONS

Criticality is a more general cost function than MLP: Srinivasan et al. [2001] deﬁne a critical

load as any load that needs to complete early to prevent processor stalls, while a non-critical

load is one that can tolerate long latency. Critical loads are identiﬁed using a variety of tech-

niques [Fields et al., 2001, Srinivasan and Lebeck, 1998], and criticality-driven cache optimiza-

tions prioritize critical loads over non-critical loads. To highlight important advances in this

area, we now discuss two criticality-driven optimizations.

5.2.1 CRITICAL CACHE

Using detailed limit studies, Srinivasan and Lebeck [1998] determine that load criticality can

be determined by the characteristics of the chain of instructions dependent on the load. In

particular, a load is classiﬁed as critical if it satisﬁes any of the following criteria: (1) the load

feeds into a mispredicted branch, (2) the load feeds into another load that incurs an L1 cache

miss, or (3) the number of independent instructions issued in an N cycle window following

the load is below some threshold. us, this deﬁnition of criticality considers both the type of

dependent instructions (e.g., mispredicted branch, L1 misses) and the number of instructions

in its dependence chain.

To illustrate the importance of optimizing for critical loads, Srinivasan and Lebeck [1998]

show that if all critical loads could be satisﬁed by the L1 cache, the result would be an average

40% improvement over a traditional memory hierarchy, whereas if an equal percentage of loads

is randomly chosen to hit in the L1, the average improvement would be only 12%. erefore, it

may be possible to improve overall performance by decreasing the latency of these critical loads

at the expense of increased latency for non-critical loads.

To optimize critical loads, Srinivasan and Lebeck [1998] use a critical cache, which serves

as a victim cache for blocks that were touched by a critical load. For a fair comparison, the

baseline also includes a victim cache, called the locality cache, which caches both critical and

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Criticality-Aware Multi-Level Cache Hierarchy