Directory-based coherence

Directory-based coherence is a mechanism to handle Cache coherence problem in distributed shared memory (DSM) aka Non-Uniform Memory Access (NUMA). Another popular way is to use a special type of computer busbetween all the nodes as a “shared bus” (aka System bus ). [1] Directory-based coherence uses a special directory to serve the bus-based coherence protocols. Both of These designs use the Corresponding medium (ie bus or directory) as the tool to Facilitate communication entre different nodesand to guarantee that the coherence protocol is working properly along all the communicating nodes. In directory based Cache Coherence, this is done by using this directory to keep tracking of the status for all cached blocks, the status of Each block include in qui Cache coherence ” state ” that block is, and qui nodes are sharing That Block At That time, qui peut être used to Eliminate the need to broadcast all the signals to all nodes, and only send it to the nodes That are interested in this single block.

Following are a few advantages and disadvantages of the directory based cache coherence protocol:

  • Scalability : This is one of the strongest motivations for going to directory based designs. What we mean by scalability , in short, is how it is responsible for doing so. For this reason, Bus based systems can not be used to reduce the number of times they use the same time. For a relatively small amount of nodes, bus systems can do well. However, while the number of nodes is growing, some problems may occur in this look. Especially since only one node is allowed to use the bus at a time, which will be significantly affected by the performance of the overall system. On the other hand, using directory-based systems, there will be no such bottleneck to constrain the scalability of the system.
  • Simplicity : This is one of the points where bus-system is superior. Since the bus structure can be used as an organizer for all the traffic that goes through the system, and ensure the atomicity of all the signals passed through. THUS, there will be no need to put more effort into Ensuring atomicity and ordering entre signals have the box in directory based systems, qui Leads to Several overhead faced in the later system design When dealing with issues like consistency .

According to the above discussion, it is clear that using bus-based systems seems more attractive for relatively small systems. However, directory based systems become crucial when the system scales up and the number of nodes grows. So there is a kind of trade-off between simplicity and scalability when bridging between bus-based and directory-based coherence coherence designs. [1]


The idea of ​​Directory-based cache coherence systems. Although the idea of DASH ( D irectory A rchitecture for SH ared-memory) was first proposed by CK Tang [2] in mid 1970’s. HOWEVER, Applying it to Cache coherence Was Proposed A Few years later, SPECIFICALLY in 1978 When Researchers at Stanford university Proposed the first release of this consistency systems called Expired Stanford DASH , in a paper [3] That Described the system with the Difficulties and improvements that come with such designs. Beside this approach, several attempts have been made to provide scalable systems. For instance, BBN Butterfly[4] which was introduced in 1985, and IBM PR3 [5] which was introduced in 1987, are some examples of multiprocessor systems that are scalable. However, both of these systems have a drawback; For example, BBN Butterfly does not have caches. Similarly, IBM PR3 does not provide hardware cache coherence, which limits the performance of both of these designs, especially when employing high performance processors. [6]

This limitation in the other competitors, made it easier for Dash based systems to get chosen for design cache coherence systems and all other systems that need scalabality in cache-based nodes. In 1985, James Archibald [7] and Jean-Loup Baer from the University of Washington published a paper [8] which proposes a more economical, expandable, and modular variation of the “global directory” approach in the field of hardware and software. design.

In 1992, Daniel Lenoski from Stanford University published a paper [9] proposing advances in cache coherence protocols for directory-based systems. In a 1996 paper [10] , he introduced the design of the SGI Origin 2000 , a family of server computers employing directory based cache coherence. The subsequent Origin 3000 [11] was introduced in July 2000.


Unlike Snoopy coherence protocols, in a directory consistency based approach, the information about qui-have a copy of a cache block is maintained in a structure called Expired Directory. In a directory based on a scheme based on the application of the same method, it is possible to use traffic compared to a snoopy protocol is large. In well optimized applications, the most common data sharing is only available, and there is little data sharing. A directory approach can result in a substantial traffic saving compared to broadcast / snoopy approach in such applications.

[12] Directory-based coherence scheme overview diagram showing various actors and messages.

As shown in the data flow diagram, the actors involved in a distributed shared memory system

  • Requestor Node : This node is the processor that is requesting for a read / write of a memory block.
  • Directory Node : This node maintains the information of the state of the system.
  • Owner Node: An owner node owns the most recent state of the cache block, which may not be always up to date.
  • Sharer Node : One or many nodes which are sharing a copy of the cache block.

Requestor and Owner nodes maintain their state transition similarity to a snoopy consistency protocols like MESI protocol . HOWEVER, Unlike a bus based implementation Where nodes communicate using a common bus, directory based implementation uses Message passing model to exchange information required for Maintaining Cache coherence .

Directory node acts as a serializing point and all communications are maintained through this node to maintain correctness.

Directory Node

A directory node keeps track of the overall state of a cache block in the entire cache system for all processors. It can be in three states:

  • Uncached (U): No processor has data cached, memory up-to-date.
  • Shared (S) : one or more processors cached data, memory up-to-date. In this state of affairs and sharers have a clean copy of the cached block.
  • Exclusive / Modified (EM): one processor (owner) has data cached; out-of-date memory. Note that this directory can not be distinguished from an exclusive or modified state of the art processors.

Explanation of the State transition directory Finite State Machine (reference image 1) is captured below in the table:

Initial State Bus Request Response / Action New State
U BusRd goldBusRdX
  • Fetch block from the memory of the directory.
  • send the memory block to requestor using the message ( ReplyD).
  • if there are no sharers: requestor = first sharer, directory transitions into EM state.
EM BusRd
  • Send intervention ( Int ) to the Owner
  • Send Invalidation ( Inv ) to the current owner.
S BusRd
  • Reply to the requestor with the memory block ( ReplyD )
  • Reply to the requestor with the memory block ( ReplyD )
  • Invalidate ( Inv ) all sharers.
  • Invalidate ( Inv ) all sharers.
  • Reply to the requestor that he can upgrade. ( Reply )

In addition to a cache state, a directory which has been compiled in the shared state. This is required for sending invalidating and intervening requests to the individual processor caches which have the cache block in common. Few of the popular implementation approaches are:

  • Full bit-vector : A bit field for each processor at the directory node are maintained. The storage overhead scales with the number of processors.
  • Limited pointer: In this approach directory information of limited number of blocks.

Please note that the protocol described above is the basic implementation and race conditions that can be found in the context of the caches and messages between processors can be overlapping. More complex implementations are available. Scalable Coherent Interface which have multiple states.

DASH [3] cache coherence protocol is another protocol that uses a directory-based coherence scheme. Clustered approach, where processors within a cluster are kept consistent using a bus based snooping scheme, while the cluster is connected in a directory approach. Even though various protocols are used for tracking cache blocks, however the concept of directory remains.

See also

  • Coherence protocol
  • MSI protocol
  • Bit array
  • Distributed shared memory
  • Snoopy cache


  1. ^ Jump up to:Solihin b , Yan (2009). Fundamentals of parallel computer architecture . pp. 319-360.
  2. Jump up^ Tang, CK “Cache system design in the tightly coupled multiprocessor system”. AFIPS ’76 Proceedings of the June 7-10, 1976, National Computer Conference and Exhibition .
  3. ^ Jump up to:b “The Directory-Based Cache Coherence Protocol for the DASH Multiprocessor” (PDF) . Computer Systems Laboratory .
  4. Jump up^ Schmidt, GE “The Butterfly Parallel Processor”. In proc. of ICS .
  5. Jump up^ “The IBM research parallel processor prototype PR3: Introduction and Architecture”. 1985 International Conference of Parallel Processing .
  6. Jump up^ “Design of Scalable Shared-Memory Multiprocessors: The DASH Approach”. Computer system laboratory, Stanford University .
  7. Jump up^ “James Archibald” . . Retrieved 2016-11-15 .
  8. Jump up^ “An economy solution to the cache coherence problem”. ISCA ’84 Proceedings of the 11th Annual International Symposium on Computer Architecture .
  9. Jump up^ Lenoski, Daniel; Laudon, James; Gharachorloo, Kourosh; Weber, Wolf-Dietrich; Gupta, Anoop; Hennessy, John; Horowitz, Mark; Lam, Monica S. (1992-03-01). “The Stanford Dash Multiprocessor” . Computer . 25 (3): 63-79. doi : 10.1109 / 2.121510 . ISSN  0018-9162 .
  10. Jump up^ Laudon, James; Lenoski, Daniel (1997-01-01). “SGI Origin: A ccNUMA Highly Scalable Server” . Proceedings of the 24th Annual International Symposium on Computer Architecture . ISCA ’97. New York, NY, USA: ACM: 241-251. doi : 10.1145 / 264107.264206 . ISBN  0897919017 .
  11. Jump up^ Corp., Silicon Graphics International. “Support Home Page” . . Retrieved 2016-11-16 .
  12. Jump up^ Solihin, Yan (2009). Fundamentals of Parallel Multicore Architecture . pp. 319-361.

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright 2018
Shale theme by Siteturner