48 5. RICHER CONSIDERATIONS
second approach, called TADIP-Feedback (TADIP-F), accounts for interaction among appli-
cations by learning the insertion policy for each application, assuming that all other applications
use the insertion policy that currently performs the best for that application.
Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches (PIPP) Xie and
Loh [2009] build on Utility-Based Cache Partitioning [Qureshi and Patt, 2006], but instead
of strictly enforcing UCP partitions, they design insertion and promotion policies that enforce
the partitions loosely. e main insight behind their PIPP policy is that strict partitions result
in under-utilization of cache resources because a core might not use its entire partition. For
example, if the cache is way-partitioned, and if core_i does not access a given set, the ways
allocated to core_i in that set will go to waste. PIPP allows other applications to steal these
unused ways.
In particular, PIPP inserts each line with a priority that is determined by its partition
allocation. Lines from cores that have been allocated large partitions are inserted with high
priority (proportional to the size of the partition), and lines from cores that have been allocated
small partitions are inserted with low priority. On a cache hit, PIPPs promotion policy promotes
the line by a single priority position with a probability of p_prom, and the priority is unchanged
with a probability of 1 p_prom. On eviction, the line with the lowest priority is evicted.
RADAR Our discussion so far has focused on multiple programs sharing the last-level cache.
Manivannan et al. [2016] instead look at the problem of last-level cache replacement for task-
parallel programs running on a multi-core system. eir policy, called RADAR, combines static
and dynamic program information to predict dead blocks for task-parallel programs. In partic-
ular, RADAR builds on task-flow programming models, such as OpenMP, where programmer
annotations explicitly specify (1) dependences between tasks and (2) address regions that will be
accessed by each task. e runtime system uses this information in conjunction with dynamic
program behavior to predict regions that are likely to be dead. Blocks that belong to dead regions
are demoted and preferentially evicted from the cache.
More concretely, RADAR has three variants that combine information from the pro-
gramming model and the architecture in different ways. First, the Look-ahead scheme uses the
task data-flow graph to peek into the window of tasks that are going to be executed soon, and it
uses this information to identify regions that are likely to be accessed in the future and regions
that are likely to be dead. Second, the Look-back scheme tracks per-region access history to
predict when the next region access is likely to occur. Finally, the combined scheme exploits
knowledge of future region accesses and past region accesses to make more accurate predictions.
5.4 PREFETCH-AWARE CACHE REPLACEMENT
In addition to caches, modern processors use prefetchers to hide the long latency of accessing
DRAM, and it is essential that these mechanisms work well together. Prefetched data typically
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset