In computing , a cache / k æ ʃ / KASH ,  is a hardware or software component that stores data so future data can be served faster; the data stored in a cache may be the result of an earlier computation, or the duplicate of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it can not. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests can be served from the cache, the faster the system performs.
To be cost-effective and to enable efficient use of data, caches must be relatively small. Nevertheless, caps-have proven Themselves in Many areas of computing Because access patterns in typical computer applications exhibit the locality of reference . Moreover, access patterns exhibit temporal locality if requested recently, while space localityrefers to requests for data previously requested.
There is an inherent trade-off between size and speed (given that a larger resource needs greater physical distances) but also a tradeoff between expensive, premium technologies (such as SRAM ) vs cheaper, easily mass-produced commodities (such as DRAM or hard disks ).
The buffering provided by a cache of benefits both bandwidth and latency :
A larger resource incurs a significant latency for access – eg it can take hundreds of clock cycles for a modern 4 GHz processor to reach DRAM . This is a favorite place in the future. Prediction or explicit prefetching might also be expected where future reads will come from and make requests ahead of time; if done correctly the latency is bypassed altogether.
The use of a aussi Cache Allows for Higher throughput from the Underlying resource, by assembling multiple fine grain transfers into larger, more efficient requests. In the case of DRAM, this might be served by a wider bus. Imagine a program scanning bytes in a 32bit address space, but being served by a 128bit off chip data bus; individual uncached byte accesses would only allow 1 / 16th of the total bandwidth to be used, and 80% of the data movement would be addresses. Reading larger chunks reduces the fraction of bandwidth required for transmitting address information.
Hardware implements cache as a block of memory for storage of data is likely to be used again. Central processing units (CPUs) and hard disk drives (HDDs) frequently use a cache, as do web browsers and web servers.
A cache is made up of a pool of entries. Each entry has associated data, which is a copy of the same data in some backing store . Each entry also has a tag, which specifies the identity of the data in the backing of which is a copy.
When the client cache (a CPU, web browser, operating system ), it first checks the cache. If an entry can be found with a tag of the desired data, the data in the entry is used instead. This situation is known as a cache hit . So, for example, a web browser program might check its local cache on disk to see if it has a local copy of the contents of a web page at a particular URL. In this example, the URL is the tag, and the contents of the web page is the data. The percentage of accesses that results in a hit or hit ratio of the cache.
The alternative situation, when the cache is open to a desired cache , has become known . The previously uncached data of the backing store is usually copied into the cache, ready for the next access.
During a cache miss, the CPU is usually uncompleted. The heuristic is used in the context of the policy . One popular replacement policy, “least recently used” (LRU), replaces the least recently used entry (see cache algorithm ). More efficient caches compute the use of the size of the stored contents, as well as the latencies and throughputs for both the cache and the backing store. This works well for larger amounts of data, longer latencies and slower throughputs, such as having a hard drive and the Internet, but is not efficient for use with a CPU cache. [ quote needed ]
When a system writes data to cache, it must be at some point. The timing of this is known as the write policy .
There are two basic writing approaches:
- Write-through : write is done synchronously both to the cache and to the backing store.
- Write-back (also called write-behind ): initially, writing is done only to the cache. The write to the backing is postponed to the cache blocks containing the content.
A write-back caching is more complex to Implement, since it needs to track qui icts of rentals-have-been written over, and you mark ’em dirty for later writing to the backing store. The data in these locations are written back to the backing store only when they are evicted from the cache, an effect referred to as a lazy write . For this reason, a read in a write-back cache (which requires a block to be replaced by another) will require two memory accesses to service, and one to retrieve the needed data.
Other policies may also trigger data write-back. The client can make many changes in the cache, and then explicitly notify the cache to write the data.
No data is returned on write operations, thus there are two approaches for situations of write-misses:
- Write allocate (also called fetch on write ): data at the missed-write location is loaded to cache, followed by a write-hit operation. In this approach, write misses are similar to read misses.
- No-write allocate (also called write-no-allocate or write-around ): data at the missed-write location is not loaded to cache, and is written directly to the backing store. In this approach, only the reads are being cached.
Both write-through and write-back policies can be used in this writing, but usually they are paired in this way: 
- A write-back cache uses write allocate, hoping for subsequent writes (or even reads) to the same location, which is now cached.
- A write-through cache uses no-write allocate. Here, the following writes have no advantage, since they still need to be written directly to the backing store.
Other things that can be lost in the backing store, in which case the copy in the cache may become out of date or stale . Alternatively, when the client updates the data in the cache, copies will become stale. Communication protocols between the cache managers which are known as coherence protocols .
Examples of hardware caches
Small golden memories is close to the CPU can operate faster than the much larger main memory . Most CPUs since the 1980s have used one or more caches, sometimes in cascaded levels ; Modern high-end embedded , desktop, and server microprocessors can have as many as six types of cache (between levels and functions).  Examples of caches with a specific function are the D-cache and the I-cache and the translation lookaside buffer for the MMU .
Earlier graphics processing units (GPUs) often had limited read-only texture caches , and introduced mortal order swizzled textures to improve 2D cache coherency . Cache misses would drastically affect performance, eg if mipmapping was not used. Caching was important to leverage 32-bit (and wider) transfers for texture data that was often as little as 4 bits per pixel, indexed in complex patterns by arbitrary UV coordinates and perspective transformations in inverse texture mapping .
As GPUs advanced (especially with GPGPU compute shaders ), they have developed and are generally well developed, including instruction caches for shaders .  For example, GT200 architecture GPUs did not feature an L2 cache, while the GPU Fermi has 768 KB of last-level cache, the Kepler GPU has 1536 KB of last-level cache,  and the Maxwell GPU has 2048 KB of last-level cache. These caches have grown to handle primitive synchronizationbetween threads and atomic operations, and interface with a CPU-style MMU .
Digital signal processors have similarly generalised over the years. Earlier designs used scratchpad memory fed by DMA , modern purpose DSPs Such As Qualcomm Hexagon Often include a very similar set of covers to a CPU (eg Modified Harvard architecture with shared L2 split L1 I cache and D-cache). 
Translation lookaside buffer
A memory management unit (MMU) that fetches page table entries from main memory has a specialized cache, used for recording the results of virtual address to physical address translations. This specialized cache is called a translation lookaside buffer (TLB). 
While CPU caches are managed entirely by hardware, a variety of software and other caches. The page cache in main memory, which is an example of a disk cache, is managed by the operating system kernel .
Whereas the disk buffer , which is an integrated part of the hard disk drive, is sometimes misleadingly referred to as “disk cache”, its main functions are write sequencing and read prefetching. Repeated cache hits are relatively rare, due to the small size of the comparison in the drive capacity. However, high-end disk controllers often have their own on-board cache of hard disk drive’s data blocks .
Finally, a local fast disk drive can also be used for remote data storage, such as remote servers ( web cache ) or local tape drives or optical jukeboxes ; such a scheme is the main concept of hierarchical storage management . Also, fast flash-based solid-state drives (SSDs) can be used as caches for rotary-media hard drive drives, working together as hybrid drives or solid-state hybrid drives (SSHDs).
Web browsers and web proxy servers web caches employee to store previous responses from web servers , Such As web pages and pictures . Web caches reduce the amount of information that needs to be transmitted across the network, as previously stored in the cache can often be re-used. This tool reduces bandwidth and processing requirements of the web server, and helps to improve responsiveness for users of the web. 
Web browsers employ a built-in web cache, but some Internet service providers (ISPs) or organizations also use a caching proxy server, which is a web cache that is shared among all users of that network.
Another form of caching is P2P caching , where the most popular applications for peer-to-peer applications are stored in an ISP cache to accelerate P2P transfers. Similarly, decentralised equivalents exist, which allow communities to perform the same task for P2P traffic, for example, Corelli. 
A cache can store data that is computed on demand rather than retrieved from a backing store. This is a technical optimization that makes the results of resource-consuming function calls within a table lookup, allowing subsequent calls to reuse the stored results and avoiding repeated computation.
The BIND DNS daemon caches a mapping of domain names to IP addresses , as does a resolver library.
Write-through operation is common when operating over unreliable networks (like an Ethernet LAN), because of the enormous complexity of the coherency protocol between multiple write-back caches when communication is unreliable. For instance, web page caches and client-side network file system caches (like those in NFS or SMB ) are typically read-only or write-through specifically to keep the network protocol simple and reliable.
Search engines also frequently make web pages they have indexed from their cache. For example, Google provides a “Cached” link to each search result. This can be useful when web pages from a web server are only once or permanently inaccessible.
Another type of caching is storing computed results that will be needed again, or memoization . For example, this is a program that caches the output of the compilation, in order to speed up later compilation runs.
Database caching can be used to improve the implementation of databases , for example in the processing of indexes , data dictionaries , and frequently used subsets of data.
A distributed cache  uses networked hosts to provide scalability, reliability and performance to the application.  The hosts can be co-located or spread over different geographical regions.
Buffer vs. cache
The semantics of a “buffer” and a “cache” are not totally different; even so, there are fundamental differences between the process of caching and the process of buffering.
Fundamentally, caching realizes a performance increase for transfers of data that is being maintained. While a caching system may perform a performance increase on the initial (typically write) transfer of a data item, this performance increase is due to buffering within the caching system.
With read caches, a data item must have been fetched from its residing location at least once in a while. the data’s residing location. With write caches, a performance increase of a data item can be achieved in the first place by the data element. a later stage or else occurring as a background process. Contrary to strict buffering, a caching process must adhere to a (potentially distributed) cache coherency protocol in order to maintain consistency between the cache s intermediate storage and rental where the data resides. Buffering, on the other hand,
- reduces the number of transfers for the other part of the data.
- provides an intermediate for communicating processes which are incapable of direct
- ensure a minimum data size or representation of at least one of the communicating processes involved in a transfer.
With typical caching implementations, and in the case of a write, mostly realizing a performance increase for the application from where the write originated. Additionally, the portion of a caching protocol where individual writes are deferred to a batch of writes is a form of buffering. The portion of a caching protocol where one reads a form of buffering is also a form of buffering, although it may be positively impact the performance of the sum of the individual reads). In practice, caching almost always involves some form of buffering, while strict buffering does not involve caching.
A buffer is a temporary memory location traditionally used That Is Because CPU instructions can not Directly address data Stored in peripheral devices. Thus, addressable memory is used as an intermediate stage. Additionally, such a buffer may be possible when a large block of data is assembled or disassembled (as required by a storage device), or when data may be delivered in a different order than when it is produced. Also, a whole buffer of data is usually transferred to a hard disk, where it can be used to reduce the volume of the data. These benefits are present even if the buffered data is writtenbuffer once and read from the buffer
A cache also increases transfer performance. A part of the increase similarly comes from the possibility that multiple small transfers will combine into one large block. But the main performance-gain is because there is a good chance that it will be read again. A cache’s purpose is to reduce accesses to the underlying slower storage. Cache is also usually an abstraction layer that is designed to be invisible from the perspective of neighboring layers.
- Cache memory
- Prefetching cache
- Cache algorithms
- Cache coherence
- Coloring cache
- Cache hierarchy
- Cache-oblivious algorithm
- Cache stampede
- Cache language model
- Database cache
- Dirty bit
- Disk buffer
- HTML5 manifest cache
- Five-minute rule
- Materialized view
- Pipeline burst cache
- Temporary file
- Jump up^ “Cache” . Oxford Dictionaries . Oxford Dictionaries . Retrieved 2 August 2016 .
- Jump up^ John L. Hennessy; David A. Patterson (September 16, 2011). Computer Architecture: A Quantitative Approach . Elsevier. pp. B-12. ISBN 978-0-12-383872-8 . Retrieved 25 March 2012 .
- Jump up^ “intel broad core i7 with 128mb L4 cache” . Mentions L4 cache. Combined with separate I-Cache and TLB, this brings the total ‘number of caches (levels + functions) to 6
- ^ Jump up to: a b S. Mittal, ” A Survey of Techniques for Managing and Leveraging Caches in GPUs ,” JCSC, 23 (8), 2014.
- Jump up^ “qualcom Hexagon DSP SDK overview” .
- Jump up^ Frank Uyeda (2009). “Reading 7: Memory Management” (PDF) . CES 120: Principles of Operating Systems . UC San Diego . Retrieved 2013-12-04 .
- Jump up^ Multiple (wiki). “Web caching application” . Docforge . Retrieved 2013-07-24 .
- Jump up^ Gareth Tyson; Andreas Mauthe; Sebastian Kaune; Mu Mu; Thomas Plagemann. Corelli: A Dynamic Replication Service for Supporting Latency-Dependent Content in Community Networks (PDF) . MMCN’09. Archived from the original (PDF) on 2015-06-18.
- Jump up^ Paul, S; Z Fei (1 February 2001). “Distributed caching with centralized control”. Computer Communications . 24 (2): 256-268. doi : 10.1016 / S0140-3664 (00) 00322-4 .
- Jump up^ Khan, Iqbal. “Distributed Caching On The Path To Scalability”. MSDN (July 2009).