Memory ordering

Memory ordering describes the order of accesses to computer memory by a CPU. The term can Either Refer to the memory ordering generated by the compile During compile time , or to the memory ordering generated by a CPU During runtime .

In modern microprocessors , memory ordering characterizes the CPUs ability to reorder memory operations – it is a type of out-of-order execution . Memory reordering can be used to fully utilize the bus-bandwidth of different types of memory such as caches and memory banks .

The most modern uniprocessors memory operations are not executed in the order specified by the program code. In this case, it’s all about executing the execution of the program. However, it can be done in many different ways – but in multi-threaded environments (or when interfacing with other hardware via memory nozzles) problems. To avoid problems memory barriers can be used in these cases.

Compile-time memory ordering

The compiler has some freedom to sort the order of operations during compile time . However this can lead to problems if the order of memory is important.

Compile-time memory barrier implementation

See also: Memory barrier

These barriers prevent a compiler from reordering instructions during compile time – they do not prevent reordering by CPU during runtime.

  • The GNU inline assembler statement
asm volatile ("" ::: "memory");

or even

__asm__ __volatile__ ("" ::: "memory");

forbids GCC compile to read and read commands around it. [1]

  • The C11 / C ++ 11 command
atomic_signal_fence (memory_order_acq_rel);

forbids the compiler to read and read commands around it. [2]

  • Intel ECC compiler uses “full compiler fence”
__memory_barrier ()

intrinsics. [3] [4]

  • Microsoft Visual C ++ Compiler: [5]
_ReadWriteBarrier ()

Runtime memory ordering

In symmetric multiprocessing (SMP) microprocessor systems

There are several memory-consistency models for SMP systems:

  • Sequential consistency (all reads and all writes are in-order)
  • Relaxed consistency (some types of reordering are allowed)
    • Loads can be reordered after loads (better working of cache coherency, better scaling)
    • Loads can be reordered after blinds
    • Can be reordered after blinds
    • Blinds can be reordered after loads
  • Weak consistency (reads and writes are arbitrarily reordered, limited only by explicit memory barriers )

On some CPUs

  • Atomic operations can be reordered with loads and stores.
  • There can be no instruction cache pipelined, which prevents self-modifying code from being executed without special instruction cache flush / reload instructions.
  • Dependent loads can be reordered (this is unique for Alpha). If the processor fetches to point to some data after this reordering, it could not be used in the data itself but it is already cached and not yet invalidated. Allowing this relaxation makes it easier for the reader to save money. [6]
Memory ordering in some architectures [7] [8]
Type Alpha ARMv7 PA-RISC POWER SPARC RMO SPARC PSO SPARC TSO x86 x86 oostore AMD64 IA-64 z / Architecture
Loads reordered after loads Y Y Y Y Y Y Y
Loads reordered after stores Y Y Y Y Y Y Y
Reordered blinds after blinds Y Y Y Y Y Y Y Y
Reordered blinds after loads Y Y Y Y Y Y Y Y Y Y Y Y
Atomic reordered with loads Y Y Y Y Y
Atomic reordered with blinds Y Y Y Y Y Y
Dependent loads reordered Y
Incoherent pipeline cache instruction Y Y Y Y Y Y Y Y Y

Some older x86 and AMD systems have weaker memory ordering [9]

SPARC memory ordering modes:

  • SPARC TSO = total store order (default)
  • SPARC RMO = relaxed-memory order (not supported on recent CPUs)
  • SPARC PSO = not supported on recent CPUs

Hardware memory barrier implementation

See also: Memory barrier

Many architectures with SMP support have special hardware instruction for flushing reads and writes during runtime .

  • x86 , x86-64
lfence (asm), void _mm_lfence (void)
sfence (asm), void _mm_sfence (void) [10]
mfence (asm), void _mm_mfence (void) [11]
  • PowerPC
sync (asm)
  • MIPS
sync (asm)
  • Itanium
mf (asm)
dcs (asm)
  • ARMv7 [12]
dmb (asm)
dsb (asm)
isb (asm)

Compile support for hardware memory barriers

Some compilers support builtins that emits hardware memory barrier instructions:

  • GCC , [13] version 4.4.0 and later, [14] has __sync_synchronize.
  • Since C11 and C ++ 11 year atomic_thread_fence()command was added.
  • The Microsoft Visual C ++ compile [15] has MemoryBarrier().
  • Sun Studio Compiler Suite [16] has __machine_r_barrier__machine_w_barrierand __machine_rw_barrier.

See also

  • Memory model (programming)
  • Memory barrier


  1. Jump up^ GCC compiler-gcc.h Archived2011-07-24 at theWayback Machine.
  2. Jump up^ [1]
  3. Jump up^ ECC compile-intel.h Archived2011-07-24 at theWayback Machine.
  4. Jump up^ Intel (R) C ++ Compile Intrinsics Reference

    Creates a barrier across which the instruction will be compiled. The compiler can allocate local data in registers across a memory barrier, but not global data.

  5. Jump up^ Visual C ++ Language Reference_ReadWriteBarrier
  6. Jump up^ Reordering on the Alpha processor by Kourosh Gharachorloo
  7. Jump up^ Memory Ordering in Modern Microprocessors by Paul McKenney
  8. Jump up^ Memory Barriers: Hardware View for Software Hackers, Figure 5 on Page 16
  9. Jump up^ Table 1. Summary of Memory Ordering, from “Memory Ordering in Modern Microprocessors, Part I”
  10. Jump up^ SFENCE – Fence Store
  11. Jump up^ MFENCE – Memory Fence
  12. Jump up^ Data Memory Barrier, Data Synchronization Barrier, and Instruction Synchronization Barrier.
  13. Jump up^ Atomic Builtins
  14. Jump up^
  15. Jump up^ MemoryBarrier macro
  16. Jump up^ Handling Memory Ordering in Multithreaded Applications with Oracle Solaris Studio 12 Update 2: Part 2, Memory and Memory Barriers Fence[2]

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright 2019
Shale theme by Siteturner