Cache pollution describes situations where an executing computer program loads data into CPU cache unnecessarily, which causes other useful data to be removed from the cache on lower levels of the memory hierarchy , degrading performance. For example, in a multi-core processor , one core may be replaced by other cores into shared cache, or prefetched blocks may replace demand-fetched blocks from the cache.
Consider the following illustration:
T  = T  + 1; for i in 0 .. sizeof (CACHE) C [i] = C [i] + 1; T  = T  + C [sizeof (CACHE) -1];
(The assumptions here are that the cache is a pseudo-LRU , all data is cacheable, the set associativity of the cache is N (where N> 1), and at most one processor register is available to contain program values).
Right before the loop starts, T  will be fetched from memory to cache, its value updated. However, because the loop executes, the cache block contains T  has to be evicted. THUS, the next time the program requests T  to be updated, the Cache misses, and the Cache Controller Has to request the data bus to bring the hides from Corresponding block hand memory again.
In this case the cache is said to be “polluted”. Changing the pattern of data accesses by positioning the first update of T  between the loop and the second update can eliminate the inefficiency:
for i in 0 .. sizeof (CACHE) C [i] = C [i] + 1; T  = T  + 1; T  = T  + C [sizeof (CACHE) -1];
Other than code-restructuring mentioned above, the solution to cache pollution is ensuring that high-speed data are stored in cache. This can be achieved by using special cache control instructions , operating system support or hardware support.
Examples of specialized hardware instructions include “lvxl” provided by PowerPC AltiVec . This instruction loads a 128 bit wide value into a register and marks the corresponding cache block as “least recently used” ie the prime candidate for eviction upon a need to evicta block from its cache set. To appropriately use this instruction in the context of the above example, the data elements would be loaded using this instruction. When implemented in this manner, cache pollution would not take place, since the execution of such loop would not cause premature eviction of T  from cache. This would be avoided because it would be progressing in the future, but it would be avoided as soon as possible. ). Only the oldest data (not relevant for the example given) would be evicted from cache, which T  is not a member of, since its update occurs right before the loop’s start.
Similarly, using operating system (OS) support, the pages in main memory that correspond to the C data array can be marked as “caching inhibited” or, in other words, non-cacheable. Similarly, at hardware level, cache bypassing schemes  can be used which identify low-reuse data based on program access and bypass them from cache. Also, shared cache can be partitioned to avoid destructive interference between running applications. The tradeoff in these solutions is that the OS-based schemes may have the potential to achieve the best possible results by avoiding pollution avoidance (or not having a cacheable hardware) view of the program control flow and memory access pattern .
Cache pollution control has been increasing in importance because the so-called ” memory wall ” keeps on growing. Chip manufacturers continue to increase new tricks to overcome the ever increasing relative memory-to-CPU latency. They do that by increasing the size of the data and supplying them to the CPU. Cache pollution control is one of the numerous devices available to the (mainly embedded) programmer. However, other methods, which are more specific and which are highly specific, are used as well.
- Jump up^ S. Mittal, “A Survey Of Bypassing Techniques Cache”, JLPEA, 6 (2), 2016