In computer architecture , the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels can also be distinguished by their performance and controlling technologies.  Memory hierarchy affects performance in computer architectural design, algorithm prediction, and lower level programming involving locality of reference .
Designing for high performance requires the restrictions of the memory hierarchy, ie the size and capabilities of each component. Each of the various components can be viewed as share of a hierarchy of memories (m 1 , m 2 , …, m n ) in qui Each member m i is Typically smaller and faster than the next member Highest m i + 1 of the hierarchy. To limit waiting by higher levels, a lower level of response by a buffer and then signaling to activate the transfer.
There are four major storage levels. 
- Internal – Processor registers and cache .
- Main – the RAM system and controller cards.
- On-line mass storage – Secondary storage.
- Off-line bulk storage – Tertiary and Off-line storage.
This is a general memory hierarchy structuring. Many other structures are useful. For example, a paging algorithm may be considered as a level for virtual memory when designing a computer architecture , and it may include a level of distance between online and offline storage.
Properties of the technologies in the memory hierarchy
- Adding complexity slows down the memory hierarchy . 
- CMOx memory technology stretches the space in the memory hierarchy 
- One of the main ways to increase performance is minimizing how to manage the data. 
- Latency and bandwidth are two metrics associated with caches and. Neither of them is uniform, but is specific to a particular component of the memory hierarchy. 
- Predicting where the memory resides is difficult. 
- … the location in the memory hierarchy dictates the time required for the prefetch to occur. 
The number of levels in the memory hierarchy and the performance at each level has increased over time. For example, the memory hierarchy of the Intel Haswell Mobile processor circa 2013 is:
- Processor registers – the fastest possible access (usually 1 CPU cycle). A few thousand bytes in size
- Level 0 (L0) Micro Cache Operations – 6 KiB  in size
- Level 1 (L1) Cache Instruction – 128 KiB in size
- Level 1 (L1) Data Cache – 128 KiB in size. Best access speed is around 700 GiB / second 
- Level 2 (L2) Command and data (shared) – 1 MiB in size. Best access speed is around 200 GiB / second 
- Level 3 (L3) Shared cache – 6 MiB in size. Best access speed is around 100 GB / second 
- Level 4 (L4) Shared cache – 128 MiB in size. Best access speed is around 40 GB / second 
- Main Memory ( Primary Storage ) – Gigabytes in size. Best access speed is around 10 GB / second.  In the case of a NUMA machine, access times may not be uniform
- Disk storage ( Secondary storage ) – Terabytes in size. As of 2017, the best access speed is from a solid state drive 2000 MB / second 
- Nearline storage ( Tertiary storage ) – Up to exabytes in size. As of 2013, best access speed is 160 MB / second 
- Offline storage
The lower levels of the hierarchy – from disks downwards – are also known as tiered storage . The formal distinction between online, nearline, and offline storage is: 
- Online storage is immediately available for I / O.
- Nearline storage is not immediately available, but can be made online quickly without human intervention.
- Offline storage is not immediately available, and requires some human intervention to bring online.
For example, spin-off disks are online, while spinning disks that spin-down, such as massive array of idle disk ( MAID ), are nearline. Removable media such as tape cartridges that can be automatically loaded, as in a tape library , are nearline, while cartridges that must be manually loaded are offline.
Most modern CPUs are so fast that they are most important, the bottleneck is the locality of reference of memory accesses and the efficiency of the caching and memory transfer of different levels of the hierarchy [ citation needed ] . As a result, the CPU spends much of its time idling, waiting for memory I / O to complete. This is sometimes called the space cost , as it is a larger size and a higher level of slower level. The resulting and load on memory use is Known As pressure (respectivement register pressure , hides pressure, and (main) memory pressure ). Terms for data being white missing from a Higher Level and Needing to be fetched from a lower level are, respectivement: register spilling (due to register pressure : register to cache), Cache miss (cache to main memory), and (hard) page fault (main memory to disk).
Modern programming languages mainly assume two levels of memory, main memory and disk storage, but in assembly language and inline assemblers in languages such as C , registers can be directly accessed. Taking optimal advantage of the memory hierarchy requires the cooperation of programmers, hardware, and compilers (as well as underlying support from the operating system):
- Programmers are responsible for moving data between disk and memory through I / O.
- Hardware is responsible for moving data between memory and cache.
- Optimizing compilers are responsible for generating code that, when executed, will cause the hardware to use caches and registers efficiently.
Many programmers assume one level of memory. This works fine until the application hits a performance wall. Then the memory will be assessed during code refactoring .
- Memory characteristics
- Cache Hierarchy
- Use of spatial and temporal locality: hierarchical memory
- The difference between buffer and cache
- Cache hierarchy in a modern processor
- Memory wall
- Computer memory
- Hierarchical storage management
- Cloud storage
- Memory access pattern
- ^ Jump up to:a b Toy, Wing; Zee, Benjamin (1986). Computer Hardware / Software Architecture . Prentice Hall. p. 30. ISBN 0-13-163502-6 .
- Jump up^ Write-combining
- Jump up^ “Memory Hierarchy” . Unitity Semiconductor Corporation. Archived fromthe original on 5 August 2009 . Retrieved 16 September 2009 .
- Jump up^ Pádraig Brady. “Multi-Core” . Retrieved 16 September 2009 .
- ^ Jump up to:a b c van der Pas, Ruud (2002). Santa Clara, California: Sun Microsystems: 26. 817-0742-10 http://www.sun.com/ . Missing or empty ( help ); ignored ( help )
- Jump up^ Crothers, Brooke. “Dissecting Intel’s top graphics in Apple’s 15-inch MacBook Pro – CNET” . News.cnet.com . Retrieved 2014-07-31 .
- Jump up^ “Intel’s Haswell Architecture Analyzed: Building a New PC and a New Intel” . AnandTech . Retrieved 2014-07-31 .
- ^ Jump up to:a b c d e “SiSoftware Zone” . Sisoftware.co.uk . Retrieved 2014-07-31 .
- Jump up^ “Samsung 960 Pro M.2 NVMe SSD Review” . storagereview.com . Retrieved 2017-04-13 .
- Jump up^ “Ultrium – LTO Technology – Ultrium GenerationsLTO” . Lto.org. Archived from the original on 2011-07-27 . Retrieved 2014-07-31 .
- Jump up^ Pearson, Tony (2010). “Correct use of the term Nearline” . IBM Developerworks, Inside System Storage . Retrieved 2015-08-16 .