Directory-based cache coherence

In computer engineering , directory-based cache coherence is a type of cache coherence mechanism , where directories are used to manage caches in place of snoopy methods due to their scalability. Snoopy bus-based methodsscale poorly due to the use of broadcasting . These methods can be used to target both performance and scalability of systems. [1]

Full bit vector format

In the full bit vector format, For Each can hide line in memory , a bit is used to track whether every individual processor HAS icts That Line Stored in hiding . [2] The full bit vector format is the simplest structure to implement, but the least scalable. [1] The SGI Origin 2000 uses a combination of a vector and a vector. [3]

Each directory entry must have 1 bit stored per processor per cache line, along with bits for tracking the state of the directory. This leads to the total size required of x number of processors x number of cache lines , having a storage overhead ratio of (number of processors) / (cache block size × 8) .

It can be observed that overhead scales linearly with the number of processors. While this may be fine for a small number of processors, when implemented in large systems becomes excessive. For example, with a block size of 32 bytes and 1024 processors, the storage overhead ratio becomes 1024 / (32 × 8) = 400%. [2]

Coarse bit vector format

The coarse bit vector formats: has a similar structure to the full bit vector format, though Rather than tracking one bit per processor for every cover line, the directory groups Several processors into nodes , Storing whether a cached line is Stored in a node Rather than a line. This is the cost of the bus of saving (processors per node) × (total lines) bits of space. [3]Thus the ratio overhead is the same, just replacing number of processors with number of processor groups. When a request is made for a cache line that has a processor in the group, the directory broadcasts the signal into the processor rather than the caches that contain it, data cached. [2]

In this case the directory entry uses 1 bit for a group of processors for each cache line. For the same example as Full Bit Vector format if we consider 1 bit for 8 processors as a group, then the storage overhead will be 128 / (32 × 8) = 50%. This is a significant improvement over the Full Bit Vector format.

Sparse directory format

A cache only stores a small subset of blocks in a particular memory. Hence most of the entries in the directory will belong to uncached blocks. In the sparse directory the format is reduced by storing only the cached blocks in the directory. [2] Consider a processor with a cache size of 64KB with a block size of 32 bytes and a 4MB size. The maximum number of entries in the directory is 2048. If the directory has an entry for all the memory of the number of entries in the directory will be 131072. Thus it is obvious that the storage improvement provided by sparse directory format is very significant.

Number-balanced binary tree format

In this format the directory is decentralized and distributed among the caches that share a memory block. Different caches that share a memory block are arranged in the form of a binary tree . The cache that accesses a memory block is the root node . Each memory block has the root node information (HEAD) and Sharing counter field (SC). The SC field has the number of caches that share the block. Each cached entry HAS pointers to the next sharing caches Known As L-CHD and CHD-R. A condition for this directory is that the binary tree should be balanced, ie the number of nodes in the left subtree must be equal to or greater than the number of nodes in the right subtree. All the subtrees should also be balanced. [4]

Chained directory format

In this format, the memory holds the directory which has the following cache in the cache. So when a processor sends a write request to a block in memory, the processor sends invalidations down the chain of pointers. In this directory When a block is cached REPLACED we need to traverse the list in order to change the directory qui Increases latency . In order to prevent prevention this doubly linked lists are now Widely used in qui Each cached copy HAS pointers to previous and the next caching That accesses the block. [5]

Limited pointer format

The limited pointer format uses a number of pointers to the processors that are caching the data. When a new processor caches a block, a free pointer is chosen from a pool to point to that processor. There are a few options for handling cases when the number of sharers exceeds the number of free pointers. One method is to invalidate one of the sharers, using it’s pointer for the new requestor, though this can be costly in cases where a block has a large number of readers, such as a lock. Another method is to have a separate pool of free pointers available to all the blocks. This method is usually effective as a part of a large number of processors. [2]


  1. ^ Jump up to:b Reihnhart, Steven; Basu, Arkaprava; Beckmann, Bradford; Hill, Mark (2013-07-11). ” ” CMP Directory Coherence: One Granularity Does Not Fit All ” ” (PDF) .
  2. ^ Jump up to:e Solihin Yan (2015-10-09). Fundamentals of Parallel Multicore Architecture . Raleigh, NC: Solihin Publishing and Consulting, LLC. pp. 331-335. ISBN  978-1-4822-1118-4 .
  3. ^ Jump up to:b Laudon, James; Lenoski, Daniel (1997-06-01). The SGI Origin: a ccNUMA highly scalable serve . Proceedings of the 24th Annual International Symposium on Computer Architecture.
  4. Jump up^ Seo, Dae-Wha; Cho, Jung Wan (1993-01-01). “Directory-based cache coherence scheme using number-balanced binary tree” . Microprocessing and Microprogramming . 37 (1): 37-40. doi : 10.1016 / 0165-6074 (93) 90011-9 .
  5. Jump up^ Chaiken, D .; Fields, C .; Kurihara, K .; Agarwal, A. (1990-06-01). “Directory-based coherence cache in large-scale multiprocessors” . Computer . 23 (6): 49-58. doi : 10.1109 / 2.55500 . ISSN  0018-9162 .

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright 2019
Shale theme by Siteturner