In computer architecture , a transport triggered architecture ( TTA ) is a kind of CPU design in qui programs Directly control the internal transportation nozzles of a processor. Computation happens as a side effect of data transport: writing data into a triggering port of a functional unit triggers the functional unit to start a computation. This is similar to what happens in a systolic array . Due to its modular structure, TTA is an ideal processor template for the application-specific instruction-set processors ( ASIPwith customized datapath but without the inflexibility and design cost of fixed hardware accelerators.
Typically a transported triggered processor with multiple vectors and multiple connected devices, which provides opportunities for instruction level parallelism . The parallelism is statically defined by the programmer. In this respect (and obviously due to the wide word width statement), the TTA architecture resembles the very long word (VLIW) architecture. A TTA instruction word is composed of multiple slots, one slot per bus, and each slot determines the data transport that takes place on the corresponding bus. The fine-grained control allows some optimizations that are not possible in a conventional processor. For example, software can transfer data directly between functional units without using registers.
Transport triggering exposes some microarchitectural details that are normally hidden from programmers. This Greatly simplified the control logic of a processor, Because Many decisions Normally done at run time are fixed at compile time . However, it also means that it can be compiled for a small difference between the two. The binary incompatibility problem, in addition to the complexity of implementing a full context switch, makes TTAs more suitable for embedded systems than for general purpose computing.
Of all the one instruction set computer architectures, the TTA architecture is one of the few that has had CPUs based on it, and the only one that has CPUs based on it sold commercially.
Benefits in comparison to VLIW Architectures
TTAs can be seen as “exposed datapath” VLIW architectures. While VLIW is programmed using operations, TTA splits the operation to multiple move operations. The low level programming model offers several benefits in comparison to the standard VLIW. For example, TTA architecture can provide more parallelism with simpler register files than with VLIW. As the programmer is in control of the timing of the operand and result data transport, the complexity of the input and output ports of the register (RF) multiple parallel instructions.
An important unique software optimization enabled by the transport programming is called software bypassing . In the case of software bypassing, the program bypasses the register file by the operator directly. Where this optimization is applied aggressively, the original move that the results to the register can be completely eliminated, thus reducing both the register and the portability of a general purpose register for other temporary variables. The reduced register pressure , in the simplification of the required complexity of the RF hardware, can lead to significant CPU energy savings , an important benefit especially in mobile embedded systems. 
TTA processors are built of independent functional units and register files , which are connected with transport nozzles and sockets .
Each function unit implements one or more operations , which implements a simple and complex implementation of arbitrary user-defined application-specific computation. Operands for operations are transferred through function unit ports .
Each function unit may have an independent pipeline . In case a function unit is fully pipelined , a new operation that takes multiple clock cycles to completion can be started in every clock cycle. On the other hand, it is possible that it does not always accept an operation when it is still executing.
Data memory access and communication to the outside of the processor is handled by using special function units. Function units that implement memory memory operations and connect to a memory module are often called load / store units .
Control unit is a special box of functional units which controls the execution of programs. Control unit has access to the instruction memory in order to be executed. In the case of an arbitrary position in the executed program, the control unit provides for the control of operations. A control unit usually has a pipeline instruction , which consists of stages for fetching, decoding, and executing program instructions.
Register files contain general purpose registers , which are used to store variables in programs. Like function units, also register files with input and output ports. The number of read and write ports, that is, the capability of being able to read and write multiple registers in a same clock cycle, can vary in each register file.
Transport nozzles and sockets
Interconnect architecture Consists of transportation nozzles qui are connected to ports by function unites moyen de sockets . Due to expense of connectivity, it is usual to reduce the number of connections between units (function units and register files). A TTA is said to be fully connected in case of each input port to each unit’s input ports.
Sockets provide means for programming TTA processors by permitting which bus to port connections of the socket are enabled at any time instant. Thus, data transport can be programmed by defining the source and destination socket / port connection to be enabled for each bus.
Some TTA implementations support conditional execution .
Conditional execution is implemented with the aid of guards . Each data transport can be conditionalized by a guard, which is connected to a register (often a 1-bit conditional register ) and to a bus. In case the value of the guarded register evaluates to false (zero), the data transport program for the bus is connected to it squashed , that is, not written to its destination. Unconditional data transports are not connected to any guard and are always executed.
All processors, including TTA processors, include control statements, which are used to implement subroutines , if-then-else , for-loop , etc. The assembly language for TTA processors typically includes unconditional branches (JUMP), conditional relative branches (BNZ), subroutine call (CALL), conditional return (RETNZ), and so on. that look the same as the corresponding assembly language instructions for other processors.
Like all other operations on TTA machine, these instructions are implemented as “move” instructions to a special function unit.
TTA implementations that support conditional execution, such as the sTTAck and the first MOVE prototype, can be implemented as part of the process.  
TTA implementations that only support unconditional data transport, such as the MAXQ, typically has a special function unit tightly connected to the program that responds to a variety of destination addresses. Each such address, when used as the destination of a “move”, has a different effect on the program counter-each “relative branch <condition>” statement has a different destination address for each condition; and other destination addresses are used CALL, RETNZ, etc.
In more traditional processor architectures, a processor is usually programmed by defined operations and their operators. For example, an addition statement in a RISC architecture might look like the following.
add r3, r1, r2
This example adds to the value of general-purpose registers r1 and r2 and stores the result in register r3. Coarsely, the execution of the instruction in the processor probably results in the connection The interconnection network is used to transfer the current values of registers to the capability of executing the add operation, often called ALU as in Arithmetic-Logic Unit. Finally, a control signal selects and triggers the addition operation in ALU, of which result is transferred back to the register r3.
TTA programs do not define the operations, but only the data transport needed to write and read the operand values. Operation is triggered by writing data to a triggering operand of an operation. Thus, an operation is executed as a side effect of the triggering data transport. Therefore, executing an addition operation in TTA requires three data definitions transportation, aussi called Expired moves. A move defines endpoints for a data transport taking place in a transport bus. For instance, a move to a state of data transport from function unit F, port 1, to register file R, register index 2, should take place in bus B1. In case there are multiple nozzles in the target processor, each bus can be used in parallel in the same clock cycle. Thus, it is possible to exploit data transport level parallelism by scheduling several data transports in the same instruction.
An addition operation can be executed in a TTA processor as follows:
r1 -> ALU.operand1 r2 -> ALU.add.trigger ALU.result -> r3
The second move, a write to the second operand of the function unit called ALU, triggers the addition operation. This makes the result of the output ‘result’ after the execution latency of the ‘add’.
The port associates with the ALU may act as an accumulator , allowing creation of macro instructions that abstract the underlying TTA:
lda r1 ; "load ALU": move value to ALU operand 1 add r2 ; add: move value to add trigger sta r3 ; "ALU store": move value from ALU result
Program visible visible operation latency
The leading philosophy of TTAs is to move from hardware to software. Due to this, several additional hazards are introduced to the program. One of them is delay slots , the program visible operation latency of the function units. Timing is completely the responsibility of the programmer. The program has not reached the point of view. There is no hardware detection to the processor in the box. Consider, for example, an architecture That year operation HAS add with latency of 1, and operation mul with latency of 3. When triggering the addoperation, it is possible to read the result in the next instruction (next clock cycle), but in case of mul , one has to wait for two instructions before the result can be read. The result is ready for the 3rd statement after the triggering instruction.
The result of a previously triggered operation, or in the case of a transaction, was previously triggered. The result is not enough to make the next operation result.
Due to the abundance of program-visible processor, which includes the process of integrating data into the input and output variables implement in a TTA processor. Therefore, interrupts are usually not supported by TTA processors, but their task is delegated to an external hardware (eg, an I / O processor) or their need is avoided by using an alternative synchronization / communication mechanism such as polling.
- MAXQ   from Dallas Semiconductor , the only commercially available microcontroller on Triggered Architecture, is an OISC or ” one set computer instruction “. It offers a single flexible though MOVE statement, qui Then can function as virtual various instructions by moving capital gains directly to the program counter .
- The “move project” has designed and fabricated several experimental TTA microprocessors.
- The TCE project is a re-implementation of the MOVE tools. The tools are available as open source, and the compiler is built around the LLVM compiler framework.  
- The architecture of the Amiga Copper has all the basic features of a transport triggered architecture.
- The Able processor developed by New England Digital.
- The WireWorld based computer .
- Dr. Dobb’s published One-Der has 32-bit TTA in Verilog with a matching cross-assembler and Forth compiler.  
- Mali (200/400) vertex processor, uses a 128-bit instruction word single precision floating point scalar TTA.
- Application-specific instruction-set processor (ASIP)
- Very long word statement (VLIW)
- Explicitly parallel instruction computing (EPIC)
- Dataflow architecture
- Jump up^ V. Guzma, P. Jääskeläinen, P. Kellomäki, and J. Takala, “Impact of Software Bypassing on Level Parallelism Instruction and Register File Traffic”
- Jump up^ Johan Janssen. “Compile Strategies for Transport Triggered Architectures”. 2001. p. 168.
- Jump up^ Henk Corporaal. “Transport Triggered Architectures Examined for General Purpose Applications”. p. 6.
- Jump up^ Aliaksei V. Chapyzhenka. “sTTAck: Stack Transport Triggered Architecture”.
- Jump up^ “MAXQ Family User’s Guide”. Section “1.1 Set instruction” says “A register-based, transport-triggered architecture allows you to read and write a code. / or memory locations. “
- Jump up^ Introduction to the MAXQ Architecture- Includes transfer map
- Jump up^ TTA Codesign Environment, open source (MIT licensed) toolset for TTA processors.
- Jump up^ Articleabout TTAs, Explaining how the TTA-basedLLVMCodesign Environment project uses
- Jump up^ Dr. Dobb’s section with 32-bit CPU in FPGA Verilog
- Jump up^ Web site with more details on the Dr. Dobb’s CPU