Dataflow architecture is a computer architecture that directly contrasts the traditional Neumann architecture or control flow architecture . Dataflow architectures do not have a program counterpart , or (at least conceptually) the executability and execution of instructions are in fact dependent on the availability of input arguments to the instructions, so that the order of instruction execution is unpredictable: ie behavior is indeterministic.
Although no commercially successful general-purpose computer hardware has been used in dataflow architecture, it has been successfully implemented in such hardware in digital signal processing , network routing , graphics processing , telemetry , and more recently in data warehousing. [ citation needed ] It is also very important in many software architectures today including database engine designs and parallel computing frameworks. [ quote needed ]
Synchronous dataflow architectures tune to match the workload presented by real-time data applications such as wire speed packet forwarding. Dataflow architectures that are deterministic in nature enable programmers to manage complex tasks as well as load balancing, synchronization, and accesses to common resources. 
Meanwhile, there is a clash of terminology, since the term dataflow is used for a subarea of parallel programming: for dataflow programming .
Hardware architectures for dataflow was a major topic in computer architecture research in the 1970s and early 1980s. Jack Dennis of MIT pioneered the field of static dataflow architectures while the Manchester Dataflow Machine and MIT Tagged Token architecture were major projects in dynamic dataflow.
The research, however, never overcame the problems related to:
- Efficiently broadcasting data in massively parallel system.
- Efficiently dispatching instruction tokens in a massively parallel system.
- Building CAMs large enough to hold all of the dependencies of a real program.
Instructions and their data dependencies are distributed to a large network. That is, the time for the instructions and the tagged results to a large connection network is longer than the time to actually do the computations.
Nonetheless, out-of-order execution (OOE) has become dominant computing paradigm since the 1990s. It is a form of restricted dataflow. This paradigm introduced the idea of an execution window . The execution of the architecture of the Neumann architecture, however within the window, is allowed to be completed. This is accomplished in CPUs that dynamically tag the data dependencies of the code in the execution window. The logical complexity of dynamically keeping track of the data dependencies, restricts OOE CPUs to a small number of execution units (2-6) and to the size of 32 to 200 instructions, much smaller than envisioned for full dataflow machines.
Dataflow architecture topics
Static and dynamic dataflow machines
Designs that are used as static dataflow machines. These machines did not allow multiple instances of the same routines to be executed simultaneously because of the simple tags could not differentiate between them.
Designs that use content-addressable memory (CAM) are called dynamic dataflow machines. They use tags in memory to facilitate parallelism.
Normally, in the control flow of architecture, compilers analysis program source code for data dependencies between instructions in order to better organize the instruction sequences in the binary output files. The instructions are organized sequentially but the information is not recorded in the binaries. Binaries compiled for a machine dataflow contain this dependency information.
A dataflow compile records these dependencies by creating unique tags for each dependency instead of using variable names. By giving each dependency a unique tag, it allows the non-dependent code segments in the binary to be executed out of order and in parallel. Compile detects the loops, statements and various programming syntax for data flow.
Programs are loaded into the CAM of a dynamic dataflow computer. When the instruction becomes available, the statement is marked as ready for execution by an execution unit .
This is known as activating or firing the instruction. Once an instruction is completed by an execution unit, its output data is sent (with its tag) to the CAM. Any instructions that are related to this particular datum (identified by its tag value) are then marked as ready for execution. In this way, subsequent instructions are executed in proper order, avoiding race conditions . This order may be different from the sequential order of the human programmer, the program order.
An instruction, along with its required data operands, is transmitted to an execution unit as a packet, also called an instruction token . Similarly, output data is transmitted back to the CAM as a data token . The packetization of instructions and results allows a large scale.
Dataflow networks deliver the instruction tokens to the execution units and return the data tokens to the CAM. In contrast to the conventional Neumann architecture , data tokens are not permanently stored in memory, rather they are transient messages that only exist when in transit to the storage instruction.
- Parallel computing
- BMDFM: Binary Modular Dataflow Machine
- Systolic array
- Transport triggered architecture
- Jump up^ “HX300 Family of NPUs and Programmable Ethernet Switches to the Fiber Access Market”, EN-Genius , June 18, 2008.
- Jump up^ Manchester Dataflow Research Project, Research Reports: Abstracts, September 1997