Cache hierarchy

Cache hierarchy  or Multi-level caches refers to a memory model that is more likely to be requested by processors. The purpose of such memory models is to provide a higher performance of memory related instructions, and a higher overall performance of the system.

Was this model for CPU cores to run at faster clocks Needing to hide the memory latency of the main memory access. Today  Multi-level caches  are the best solution to provide such a fast access to data residing in main memory. The CPU’s performance can be relaxed by using a CPU clock .  [1]


In the history of computer and electronic chip developments, it was a time that CPUs were getting faster and faster while memory accelerated. At the time, this gap and difference between CPUs and memories has become a point of need for enhancements in memory access time. With getting faster CPUs, these systems have been able to run faster and more efficiently, but they are more likely This issue was the motivation behind achieving better performance and better performance. Therefore, the needs for such memory models have reached the concept of Cache memory. This concept was first proposed by Maurice Wilkes to the British Computer Scientist in the University of Cambridge in 1965, but at the time he called such memories as “slave memory”.  [2]  Roughly entre 1970-1990 There Were Lots of papers and items Proposed by Many People like Anant Agarwal , Alan Jay Smith, Mark D. Hill, Thomas R. Puzak, etc., regarding enhancement and analysis for a better cache memory designs. First cache memory models have been implemented, but as researchers have been investigating and proposing better designs, the need for faster memory models has been sensed. Because they have improved data access models, they could not be used in the past. Therefore, approximately from 1990 and so on, in the form of a second-level backup to the first level. Many people like Jean-Loup Baer, Wen-Hann Wang, Andrew W. Wilson, etc. have conducted researches on this model. When several simulations and implementations have been demonstrated, the concept of multi-level caches is a new and more important model of cache memory compared to its previous single form. From year 2000 until now multi-level cache models have been widely distributed and can be found in Intel Core i7 products.  [3]

Multi-level cache

To………….. It. It. It. It. It. It. It. It. It. It. It. In order to hide this memory from the processor, data caching is used. Whenever the data is required by the processor, it is fetched from the memory and stored in the small structure called Cache. For any further references to that data, the cache is searched first before going to main memory. This structure resides closer to the processor with respect to Main Memory. The advantages of using cache can be obtained by calculating the average access time (AAT) for the memory hierarchy.

Average access time (AAT)

Cache, being small in size, may result in frequent misses and we may eventually And hence, the AAT depends on the rate of all the structures that it searches through for the data.  [4]

{\ displaystyle {\ text {AAT}} = {\ text {hit time}} + ({\ text {miss rate}}) \ times ({\ text {miss penalty}}))}

AAT for main memory is given by Hit time  main memory  . AAT for caches can be given by

Hit time  cache  + (Miss rate  cache  + Miss Penalty  time  .

Hit time for caches is much less than the time and the result is significantly improved.


While using the cache to improve the memory latency, it is not always possible The same size direct mapped caches usually have more misses than the fully associative caches. This method may also depend on the benchmark that we are testing the processor upon and the pattern of instructions. But always using the fully associative cache can be used to save time. Due to this, the trade-off between the power consumption and the size of the cache becomes critical in the cache design.  [4]


In the case of a missile in the cache, the purpose of such a structure will be rendered useless. The idea of ​​using multiple levels of cachecomes into picture here. This means that we are in the process of becoming closer to the processor, we will be looking for ways to keep track of the future. The general trend is to keep L1 and L1 cache smaller and at a distance of L1 and hence have lower miss rate. This, in turn, results into a better AAT. The number of levels can be conceived by the architects as per the requirement for the check-offs between cost, AATs, and size.

Performance gains

With the technology scaling which makes memory systems possible and smaller to accommodate a single chip, most of the modern day processors go for up to 3 or 4 levels of caches. The reduction in the AAT can be understood by this example where we check AAT for different configurations up to 3-level caches.

Example  : Main memory = 50ns, L1 = 1ns (10% miss rate), L2 = 5ns (1% miss rate), L3 = 10 ns (0.2% miss rate) ·

  1. AAT (No cache) = 50ns
  2. AAT (L1 cache + Main Memory) = 1ns + (0.1 × 50ns) = 6ns
  3. AAT (L1 cache + L2 cache + Main Memory) = 1ns + (0.1 × (5 + 0.01 (50ns)) = 1.55ns
  4. AAT (L1 cache + L2 cache + L3 cache + Main Memory) = 1ns + (0.1 × (5 + 0.01 (10 + 0.002 × 50ns))) = 1.5001ns


  • Increase in the cost of memories and hence the overall system.
  • Cached data is stored only so long as power supply is provided.
  • Increase in area consumed by memory system on chip.  [5]
  • In case of a large programs with poor temporal locality, even the multi-level caches can not help improve the performance and eventually, main memory needs to be reached to fetch the data.  [6]


Banked versus unified

In a banked cache, the cache is divided into cache and data cache. In contrast, a unified cache contains both instructions and data combined in the same cache. During a process, the upper-level cache is accessed to get the instructions to the processor in each cycle. The cache will also be accessed to get data. Requiring both actions to be implemented at the same time requires multiple access time. Having multiple ports requires additional hardware and wiring. Therefore, the L1 cache is organized as a banked cache which results in less ports, less hardware and low access time.  [4]

The lower level caches L2 and L3 are accessed only when there is a cache in the L1 cache. Therefore, the unified organization is implemented in the lower level caches as having a single port will suffice.

Inclusion policies

Whether a block present in the upper cover layer can be present in the lower level is hidden-governed by the policies included below:  [7]

  • Inclusive
  • exclusive
  • Non-Inclusive Non-Exclusive (NINE)

In the Inclusive policy, all the blocks in the upper-level Each upper-level cache component is a subset of the lower level cache component. In this case, there is a duplication of blocks there is some wastage of memory. But checking is better in the box because it can not be guaranteed that the upper-level can not have that block.  [7]

In the exclusive policy, all the cache components are completely exclusive which implies that any element in the upper-level cache will not be present in any of the lower cache component. This tool provides a complete backup of the cache memory. However, there is a high memory access latency.  [8]

The above policies require a set of rules to be followed in order to implement them. If none of these are forced, the resulting inclusion is called Non-Inclusive Non-Exclusive (NINE). This means that the upper level can be hidden in the lower level.  [6]

Write policies

There are two policies which define the way in which they are stored in their memory:  [7]

  • Write Through
  • Write Back

In the case of Write through policy when the value of the cache block changes, it is further modified in the lower-level memory hierarchy as well. This policy ensures that the data is stored safely in the hierarchy.

However, in the case of the Write Back policy, the changed cache will be updated when the cache block is evicted. Writing back every block that is not efficient. Therefore, we use the concept of Dirty bit attached to each cache block. The dirty bit is made high when the weather is low and the weather is high. In this policy, there is no need for more information than that.

In case of a write Where the byte is not present in the cached block write the policies below determine whether the byte Has to be Brought to the cover or not:  [7]

  • Write Allocate
  • Write No-Allocate

Write Allocate policy states that write in the cache, write in the cache and write in the cache. In the Write No-Allocate policy, if the block is missing in the cache it will just write in the cache.

The common combinations of the policies are Write Back Write Allocate and Write Through Write No-Allocate.

Shared versus private

A private cache is particular to that particular core and can not be accessed by the other cores. Since each core has its own private cache, it may be duplicate blocks in the cache which leads to reduced capacity utilization. However, this organization leads to a lower latency.  [7]  [9]  [10]

A shared cache is where it is shared among multiple cores and therefore can be directly accessed by any of the cores. Since it is shared, each block in the cache is unique and therefore has no duplicate blocks. However, the cache hit latency is larger than multiple cores try to access the same cache.

In the multi-core processors , the organization of the cache to be shared or private impacts the performance of the processor. In practice, the upper-level cache L1 (or sometimes L2)  [11]  [12] is implemented as part of the process.

Recent implementation models

Intel Broadwell Microarchitecture (2014)

  • L1 Cache (Instruction and Data) – 64kB per core
  • L2 Cache – 256kB per core
  • L3 Cache – 2MB to 6MB shared
  • L4 Cache – 128MB of eDRAM (Iris Pro Models only)  [11]

Intel Kaby Lake Microarchitecture (2016)

  • L1 Cache (Instruction and Data) – 64kB per core
  • L2 Cache – 256kB per core
  • L3 Cache – 2MB to 8MB shared  [12]

IBM Power 7

  • L1 Cache (Instruction and Data) – Each 64-banked, each bank has 2rd + 1wr 32kB ports, 8-way associative, 128B block, Write through
  • L2 Cache – 256kB, 8-way, 128B block, Write back, Inclusive of L1, 2ns latency access
  • L3 Cache – 8 regions of 4MB (total 32MB), local region 6ns, remote 30ns, each region 8-way associative, DRAM data array, SRAM tag array  [14]

See also

  • Power7
  • Intel Broadwell Microarchitecture
  • Intel Kaby Lake Microarchitecture
  • CPU Cache
  • Memory hierarchy
  • CAS latency
  • Cache (computing)


  1. Jump up^  “Cache: Why Level It” (PDF) .
  2. Jump up^  “Sir Maurice Vincent Wilkes | British computer scientist” .  Encyclopædia Britannica  . Retrieved 2016-12-11 .
  3. Jump up^   Berkeley, John L. Hennessy, Stanford University, and David A. Patterson, University of California ,. “Memory Hierarchy Design – Part 6. The Intel Core i7, fallacies, and pitfalls” .  EDN  . Retrieved 2016-12-11 .
  4. ^ Jump up to: c   Hennessey and Patterson.  Computer Architecture: A Quantitative Approach  . Morgan Kaufmann . ISBN  9780123704900 .
  5. Jump up^  “Memory Hierarchy” .
  6. ^ Jump up to: Solihin  b  , Yan (2016).  Fundamentals of Parallel Multicore Architecture . Chapman and Hall. pp. Chapter 5: Introduction to the Memory Hierarchy Organization. ISBN  9781482211184 .
  7. ^ Jump up to: e   Solihin Yan (2009).  Fundamentals of Parallel Computer Architecture  . Solihin Publishing. pp. Chapter 6: Introduction to the Memory Hierarchy Organization. ISBN  9780984163007 .
  8. Jump up^  “Rating Performance of Exclusive Cache Hierarchies” (PDF) .
  9. Jump up^  “Technical Software for Shared-Cache Multi-Core Systems” .
  10. Jump up^  “An Adaptive Shared / Private NUCA Partitioning Scheme Cache for Chip Multiprocessors” (PDF) .
  11. ^ Jump up to: b   “Broadwell Intel Microarchitecture” .
  12. ^ Jump up to: b   “Intel Kaby Lake Microrchitecture” .
  13. Jump up^  “The Architecture of the Nehalem Processor and Nehalem-EP SMP Platforms” (PDF) .
  14. Jump up^  “IBM Power7” .

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright 2019
Shale theme by Siteturner