Cache
|
Quantity1
|
Size
|
Latency2
|
Replacement policy
|
Other information
|
Clock domain
|
L1 instruction cache
|
18
(1 per processor)
|
16 KB
|
3 processor clocks (pclk)
|
Pseudo least recently used (LRU)
|
4‑way set-associative
64-byte line size
|
Pclk
|
L1 data cache
|
18
(1 per processor)
|
16 KB
|
6 pclk
(integer)
|
Pseudo LRU
|
8‑way set-associative
64-byte line size
|
Pclk
|
L1 prefetch cache
|
18
(1 per processor)
|
32 × 128 bytes
|
24 pclk
|
Depth stealing and round robin
|
128‑byte line
|
Pclk / 2
|
L2 cache
|
16
|
2 MB/slice 32 MB total
L2 cache on-chip
|
82 pclk
|
LRU
|
16-way set-associative
16‑way sliced
4 banks per slice
8 sub‑banks per slice
128‑byte line
|
Pclk / 2
|
Double‑data rate (DDR) memory
|
2
|
16 GB total
|
≥ 350 pclk
|
|
128‑byte line
|
Pclk × (5 / 6)
|
Embedded dynamic random‑ access memory (eDRAM)
|
1
|
256 KB
|
≥ 80 pclk
|
Software control
|
16 bytes wide
8 eDRAM macro-internal bank
|
Pclk / 2
|
Name
|
Mount point
|
Size
|
Scope
|
Shared memory
|
/dev/shm/
|
Size is determined by the BG_SHAREDMEMSIZE environment variable.
MPICH and PAMI use some of this shared memory.
|
Node-wide, cleared when job exits.
|
Persistent memory
|
/dev/persist/
|
Size is determined by the BG_PERSISTMEMSIZE environment variable.
|
Node-wide, cleared only when BG_PERSISTMEMSIZE is specified differently or if BG_PERSISTMEMRESET is set.
|
Local memory
|
/dev/local
|
Size is determined by the Kernel_SetLocalFSWindow() SPI call.
|
Process-wide, cleared when job exits.
|
File
|
Description
|
/proc/<pid>/exe
|
A symbolic link to the executable.
|
/proc/<pid>/cwd
|
A symbolic link to the current working directory.
|
/proc/<pid>/maps
|
A regular file that represents the memory map for the process. The memory map includes text, data, heap, stack, and dynamic library address ranges for the process.
|
/proc/<pid>/cmdline
|
A regular file that contains the command line passed into the process.
|
/proc/<pid>/environ
|
A regular file that contains the environment variables for the process at job start.
|
Name
|
Description
|
L1P_stream_optimistic
|
Any L1 cache miss memory reference (optimistically) establishes a stream.
|
L1P_stream_confirmed
|
The L1p waits for confirmation before establishing the stream.
|
L1P_stream_confirmed_or_dcbt
|
Any L1 cache miss memory reference using a dcbt (data cache block touch) instruction automatically creates an established stream.
Otherwise, the L1p waits for confirmation before establishing the stream.
|
Name
|
Description
|
L1P_NestingSaveContext (default)
|
Any nested L1P_Configure() routines result in an implicit context save. The matched L1P_Unconfigure() routine restores the L1P context.
|
L1P_NestingIgnore
|
The L1P pattern routines are disabled for the thread after a nested L1P_Configure() routine is performed. The pattern routines are re-enabled when the matching L1P_Unconfigure() routine is performed.
|
L1P_NestingFlat
|
When the application makes a nested L1P_Configure() call, it is reference counted and ignored. Any L1P_SetPattern() calls are ignored if they occur in a nested context.
|
L1P_NestingError
|
Nested L1P_Configure() routines cause an error message and assert a failure. This causes the termination of the application.
|
Name
|
Description
|
L1P_PatternLimit_Disable
|
The limit on the number of allocated patterns is disabled (that is, no limit)
|
L1P_PatternLimit_Error
|
Exceeding the limit on the number of allocated patterns causes the pattern allocation to fail with L1P_NOMEMORY.
|
L1P_PatternLimit_Assert
|
Exceeding the limit on number of allocated patterns causes an assertion failure and the process abnormally terminate.
|
L1P_PatternLimit_Prune
|
Exceeding the limit on number of allocated patterns causes pattern allocations to be treated as though the nesting mode is L1P_NestingIgnore.
|
Name
|
Type
|
Width
|
Description
|
Size
|
Size_t
|
8 bytes
|
Size (in bytes) of the memory region that is allocated by the L1P_Allocate() routine.
|
ReadPattern
|
Void*
|
8 bytes
|
Virtual memory address for the read pattern
|
WritePattern
|
void*
|
8 bytes
|
Virtual memory address for the write/generated pattern
|
Name
|
Type
|
Width
|
Description
|
Finished
|
uint64_
|
1 bit
|
Boolean that indicates that the perfect prefetcher has completed the list. This bit is cleared when a list has started executing, and set when the list has completed.
|
Abandoned
|
1 bit
|
Set if a failure to match causes list comparison to be abandoned.
|
|
Maximum
|
1 bit
|
Set if the length of the update reaches the maximum.
|
Name
|
Description
|
||
Parameter
|
uint64_t n
|
Input
|
The maximum number of L1 misses that can be tracked by the list.
|
Return Codes
|
L1P_NOMEMORY
|
The application was unable to allocate enough memory.
|
|
L1P_
ALREADYCONFIGURED |
The L1p perfect prefetcher was already configured.
|
||
Latency
|
The implementation might require system calls.
On the CNK, this routine might use the glibc malloc() function internally. The malloc() function can then perform brk() or mmap() system calls to allocate storage.
|
||
Description:
Allocates enough storage so that the perfect prefetcher can track up to <n> L1 misses.
Storage is retained until the following actions occur:
•L1P_Unconfigure() is performed.
•L1P_SetPattern() is performed.
If the L1P_Configure() command is nested:
• If nesting mode has been set to L1P_NestingSaveContext, the L1P SPI pushes a L1P context structure onto a stack of L1P context structures. When an L1P_Unconfigure() function is called, this L1P context structure is restored. This is the default mode.
• If nesting mode has been set to L1P_NestingIgnore, the L1P SPI will reference count the L1P_Configures. When nested, the SPI does not write new pattern addresses into the L1p hardware. When the same number of L1P_Unconfigure() routines have been called, the L1P SPI returns to normal function.
• If the nesting node has been set to L1P_NestingFlat, then the L1P SPI will reference count and ignore nested L1P_Configures calls. All L1P_SetPattern() calls are ignored if they occur in a nested context.
• If nesting mode has been set to L1P_NestingError, the L1P SPI will display an error message and assert. This terminates the active process with a core file. This mode is to be used for debug purposes.
|
|||
Example
|
|||
Nested L1P_Configure:
|
Unnested L1P_Configure
|
||
L1P_Configure(1000);
// …code…
L1P_Configure(1500);
// …code…
L1P_Unconfigure();
// …code…
L1P_Unconfigure();
|
L1P_Configure(1000);
// …code…
L1P_Unconfigure();
L1P_Configure(1500);
// …code…
L1P_Unconfigure();
|
Name
|
Description
|
|
Parameters
|
None
|
|
Return codes
|
L1P_NOTCONFIGURED
|
The L1p has not been configured.
|
Latency
|
Implementation might require system calls.
On CNK, this routine might use the glibc free() routine internally. The free() call can then perform brk() or munmap() system calls to free storage.
|
|
Description:
Deallocates storage used by the L1p SPI.
If one is available, the L1P SPI will pop a L1P context structure from the stack of L1P context structures. The context will then be used to restore the previous L1P pattern status and pointers.
|
Name
|
Description
|
||
Parameters
|
int record
|
Input
|
Boolean that indicates whether L1P_PatternStart generates a new pattern. If set to TRUE, a new pattern is generated. Generation of a new pattern might occur simultaneously with the execution of an old pattern.
|
Return Codes:
|
L1P_PATTERNACTIVE
|
L1P_PatternStart() called while a pattern was active.
|
|
L1P_NOTCONFIGURED
|
The L1p has not been configured.
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers.
|
||
Description:
The perfect prefetcher will start monitoring L1 misses and performing prefetch requests based on those misses. The 'record' parameter instructs the PatternStart to record the pattern of L1 misses for the next iteration.
This L1P_PatternStart() should be called at the beginning of every entrance into the section of code that has been recorded.
|
Name
|
Description
|
|
Parameters
|
None
|
|
Return codes
|
L1P_NOTCONFIGURED
|
The L1p prefetcher has not been configured.
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers.
|
|
Description:
Suspends the active perfect prefetcher. The Linear Stream Prefetcher and the other three perfect prefetchers on the core continue to execute.
This routine can be used in conjunction with L1P_PatternResume() function to avoid recording out-of-bound memory fetches, such as instructions performing a periodic printf. It can also be used to avoid sections of code that perform memory accesses that are inconsistent between iterations.
|
Name
|
Description
|
|
Parameters
|
None
|
|
Return codes
|
L1P_NOTCONFIGURED
|
The L1p has not been configured.
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
|
Description:
Resumes the perfect prefetcher from the last pattern offset location.
This routine can be used in conjunction with L1P_PatternPause() to avoid recording memory fetches that are not likely to repeat, such as instructions performing a periodic printf. It can also be used to avoid sections of code that perform memory accesses that are inconsistent between iterations.
|
Name
|
Description
|
|
Parameters
|
None
|
|
Return codes
|
L1P_NOTCONFIGURED
|
The L1p has not been configured.
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
|
Description:
Stops the perfect prefetcher and resets the list offsets to zero.
|
Name
|
Description
|
||
Parameters
|
L1P_Status_t status
|
Output
|
Perfect prefetcher status bits
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Stops the perfect prefetcher and resets the list offsets to zero.
|
Name
|
Description
|
||
Parameters
|
L1P_Status_t status
|
Output
|
Perfect prefetcher status bits
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Returns the current status for the L1 perfect prefetcher.
|
Name
|
Description
|
||
Parameters
|
uint64_t* fetch_depth
|
Output
|
Current depth of L1 misses in the prefetching pattern.
|
uint64_t* generate_depth
|
Output
|
Current depth of L1 misses in the generated pattern.
|
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses a read-only user-space memory mapped registers
|
||
Description:
Returns the current pattern depths for the L1 perfect prefetcher. The pattern depth is the current index into the pattern that the L1p is executing.
The fetch_depth parameter is used to determine how far in the current pattern/sequence the L1p has progressed.
The generate depth parameter can be used to optimize the pattern length parameter to L1P_PatternConfigure() to reduce the memory footprint of the L1p pattern.
|
Name
|
Description
|
||
Parameters
|
L1P_PatternNest_t mode
|
Output
|
Old Nesting
|
Return Codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Returns the current nesting mode for the L1 perfect prefetcher.
The supported nesting modes are L1P_NestingSaveContext, L1P_NestingIgnore, L1P_NestingFlat, L1P_NestingError. A description of each of these modes is in “Defines and enumerations” on page 38.
|
Name
|
Description
|
||
Parameters
|
L1P_PatternNest_t mode
|
Input
|
New nesting mode
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Returns the current status for the L1 perfect prefetcher.
The default mode is L1P_NestingSaveContext. Other nesting modes are L1P_NestingIgnore, L1P_NestingFlat, L1P_NestingError. A description of each of these modes is in “Defines and enumerations” on page 38.
|
Name
|
Description
|
||
Parameters
|
Uint64_t numL1misses
|
Input
|
The number of consecutive, non-matching L1 misses that will result in a pattern being abandoned.
The valid range is 1 to 63.
Default = 63
|
Return codes
|
None defined
|
|
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Sets the number of consecutive L1 misses that did not match the current location in the pattern. After this number has been exceeded, the prefetching activity will cease and the pattern will be marked as "Abandoned" in the L1P_Status_t structure returned by the L1P_PatternStatus() function.
|
Name
|
Description
|
||
Parameters
|
Uint64_t numL1misses
|
Input
|
The number of consecutive, non-matching L1 misses that will result in a pattern being abandoned.
The valid range is 1 to 63.
Default = 63
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Sets the number of consecutive L1 misses that did not match the current location in the pattern. After this number has been exceeded, the prefetching activity will cease and the pattern will be marked as "Abandoned" in the L1P_Status_t structure returned by the L1P_PatternStatus() function.
|
Name
|
Description
|
||
Parameters
|
Uint64_t* numL1misses
|
Output
|
The number of consecutive, non-matching L1 misses that will result in a pattern being abandoned.
|
Return codes
|
None defined
|
|
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Returns the number of consecutive L1 misses that did not match the current location in the pattern. After this number has been exceeded, the prefetching activity will cease and the pattern will be marked as "Abandoned" in the L1P_Status_t structure returned by the L1P_PatternStatus() function.
|
Name
|
Description
|
||
Parameters
|
int enable
|
Input
|
L1p pattern prefetcher enable flag
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Sets a software enable/disable for L1p perfect prefetcher. This can be used to ascertain whether the usage of the prefetcher is improving performance.
|
Name
|
Description
|
||
Parameters
|
int enable
|
Output
|
L1p pattern prefetcher enable flag
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Returns the software enable/disable for L1p perfect prefetcher.
|
Name
|
Description
|
||
Parameters
|
uint64_t n
|
Input
|
The maximum number of L1 misses that can be tracked by the list.
|
L1P_Pattern_t** ptr
|
Output
|
Pointer to an existing memory access pattern.
|
|
Return codes
|
L1P_NOMEMORY
|
Application was unable to allocate enough memory
|
|
Latency
|
Implementation might require system calls.
On the CNK, this routine can use the glibc malloc() function internally. The malloc() function call can then perform brk() or mmap() system calls to allocate storage.
|
||
Description:
Allocates storage to hold an L1p pattern of L1 miss addresses. This allows for the application to allocate storage for uninitialized patterns. This pattern storage can be passed to L1P_SetPattern(). Storage must be deallocated with L1P_DeallocatePattern().
|
Name
|
Description
|
||
Parameters
|
L1P_Pattern_t* pattern
|
Input
|
Pointer to a valid pattern
|
Return codes
|
L1P_NOTAPATTERN
|
The specified pointer is not a valid pointer.
|
|
Latency
|
Implementation might require system calls.
On CNK, since memory protection is a requirement, this routine will result in a system call to validate the pattern and setup physical addresses needed by the hardware.
|
||
Description:
Sets the perfect prefetcher's hardware registers with a given pattern. This allows for retaining several patterns of memory accesses and finer control of the L1p. It is not required for the default usage model.
The L1p SPI will not deallocate the structure.
|
Name
|
Description
|
||
Parameters
|
L1P_Pattern_t** pattern
|
Output
|
Location to store the pointer to the pattern structure.
|
Return codes
|
L1P_NOTCONFIGURE
|
L1p has not been configured.
|
|
Latency
|
Implementation might require system calls.
|
||
Description:
•Returns pointers to the current L1p pattern. Later, the pattern pointer can then be passed back into L1P_SetPattern().
•After L1P_GetPattern is called, the application will own the pattern and must call L1P_DeallocatePattern() to reclaim that storage. This allows pattern storage that is allocated through L1P_PatternConfigure() to be detached and retained for later usage.
This allows for retaining several patterns of memory accesses and finer control of the L1p. It is not required for the default usage model.
|
Name
|
Description
|
||
Parameters
|
L1P_Pattern_t* ptr
|
Input
|
Pointer to an existing memory access pattern.
|
Return codes
|
L1P_NOTAPATTERN
|
The specified pointer is not a valid pointer.
|
|
Latency
|
Implementation might require system calls.
On CNK, this routine can use the glibc free() routine internally. The free() call can then perform brk() or munmap() system calls to deallocate storage.
|
||
Description:
Deallocates storage previously assigned to the list of addresses.
This allows for the application to deallocate storage for patterns that have been detached from normal L1p SPI control.
Do not use L1P_DeallocatePattern() on non-detached patterns.
|
Item
|
Description
|
||
Parameters
|
L1P_PatternLimitPolicy_t policy
|
Input
|
Specifies the behavior when the number of allocated patterns has been exceeded.
|
int numallocatedpatterns
|
Input
|
Number of allocated patterns that are allowed in the application
|
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Sets behavior when the number of allocated patterns that the application can have active exceeds an artificial limit. This can be used to determine if there is a memory leak in the pattern allocations.
The default policy is L1P_PatternLimit_Disable.
|
Item
|
Description
|
||
Parameters
|
L1P_PatternLimitPolicy_t policy
|
Output
|
Behavior when the number of allocated patterns has been exceeded.
|
|
int numactivelists
|
Output
|
Number of active/allocated patterns that are allowed in the application
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Returns the current behavior when the number of allocated patterns that the application can have active exceeds that limit. The current limit is also returned.
|
Item
|
Description
|
||
Parameters
|
int adaptiveState
|
Output
|
Boolean that indicates whether adaptive mode is enabled or disabled.
TRUE = enabled.
FALSE = disabled.
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Returns enable/disable status of the linear stream prefetcher's adaptation mode.
|
Item
|
Description
|
||
Parameters
|
int Enable
|
Input
|
Boolean that enables/disables adaptive mode.
TRUE = enabled.
FALSE = disabled.
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Enables or disables the linear stream prefetcher's depth adaptation mode
|
Item
|
Description
|
||
Parameters
|
L1P_StreamPolicy_t policy
|
Output
|
Current L1P stream policy
|
Return codes
|
None defined
|
||
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Returns the linear stream prefetch policy in the specified pointer. The policy controls when a stream is established.
|
Item
|
Description
|
||
Parameters
|
L1P_StreamPolicy_t policy
|
Input
|
New Policy
|
Return codes
|
L1P_PARMRANGE
|
An invalid stream policy was specified.
|
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Changes the linear stream prefetch policy. The policy controls when a stream is established.
|
Item
|
Name
|
||
Parameters
|
depth
|
Output
|
Integer 1 to 8 for the number of 128‑byte lines ahead to fetch for all future established stream
|
Return codes
|
L1P_PARMRANGE
|
The specified address would have resulted in a segmentation violation.
|
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Returns the default stream depth when a new stream has been created. This default depth can be modified on a per stream basis using the adaptive mode (if enabled).
|
Item
|
Description
|
||
Parameters
|
uint32_t depth
|
Input
|
Number of 128 byte lines ahead to fetch for all future established stream.
The valid range is 1 to 8.
|
Return codes
|
L1P_PARMRANGE
|
Specified stream depth is not within the valid range.
|
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
When a new stream is established, the stream is set to the initial target prefetch depth specified by L1P_SetStreamDepth(). A streams prefetch depth can subsequently vary if the adaptive prefetch mode is enabled.
|
Item
|
Description
|
|
Parameters
|
depth
|
Integer 1 to 32 for total footprint of 128-byte lines that the stream engine will endeavor to use.
|
Return codes
|
L1P_PARMRANGE
|
The specified address will cause a segmentation violation.
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
|
Description:
Gets the number of 128-byte cache lines that can be used by the linear stream prefetcher. Unallocated lines will be used by the perfect prefetcher. This can help prevent thrashing between the prefetch algorithms.
|
Item
|
Description
|
||
Parameters
|
Uint32_t depth
|
Input
|
Total footprint of 128-byte lines stream engine will endeavor to use.
The valid range is 1 to 32.
|
Return codes
|
L1P_PARMRANGE
|
The specified total stream depth is not within the valid range.
|
|
Latency
|
Inlineable function call that accesses user-space memory mapped registers
|
||
Description:
Sets the number of 128-byte cache lines that can be used by the linear stream prefetcher. The unallocated lines will continue to be used by the perfect prefetcher to help prevent thrashing between the prefetch algorithms.
|
Error code
|
Description
|
0
|
No error
|
L1P_NOMEMORY
|
There was not enough memory available to set up the L1P for the given pattern size.
|
L1P_PARMRANGE
|
The parameters that are passed to the L1P exceeded the valid range supported by the L1p hardware.
|
L1P_PATTERNACTIVE
|
Attempted to use a function when a pattern was already active. The application must issue an explicit L1P_PatternStop() before calling the function.
|
L1P_NOTAPATTERN
|
The application specified a pointer that either does not represent a generated pattern or the pointer is not valid.
|
L1P_ALREADYCONFIGURED
|
The L1p has already been configured without being previously unconfigured.
|
L1P_NOTCONFIGURED
|
The L1p has not been configured.
|