Xeon Phi

Xeon Phi [1] are a series of x86 manycore processors designed and made entirely by Intel . They are intended for use in supercomputers, servers, and high-end workstations. Its architecture makes use of standard programming languages ​​and APIs such as OpenMP . [2]

Since it was originally based on an earlier GPU design by Intel, it shares application areas with GPUs. citation needed ] The main difference between Xeon Phi and a GPGPU like Nvidia Tesla is that Xeon Phi, with an x86-compatible core, can, with less modification, than software that was originally targeted at a standard x86 CPU. quote needed ]

Initially in the form of PCIe -based add-on cards, a second generation product, codenamed Knights Landing was announced in June 2013. These second generation chips could be used as a standalone CPU, rather than just an add-in card.

In June 2013, the Tianhe-2 supercomputer at the National Supercomputer Center in Guangzhou (NSCC-GZ) was announced [3] as the world’s fastest supercomputer (As of November 2017, it is No. 2 [4] ). It uses Intel Xeon Phi coprocessors and Ivy Bridge Xeon processors to achieve 33.86 petaFLOPS. [5]

History

Background

The Larrabee microarchitecture (in development since 2006 [6] ) introduced very wide (512-bit) SIMD units to a x86 based architecture design processor, extended to a cache-coherentmultiprocessor system connected via ring bus to memory; each core was capable of four-way multithreading. Due to the design being intended for general purpose computing, the Larrabee chips also included special hardware for texture sampling. [7] [8] The project to produce a retail GPU product directly from the Larrabee research project was terminated in May 2010. [9]

Another contemporary Intel research project implementing x86 architecture on a multi-core processor was the ‘ Single-chip Cloud Computer ‘ (prototype introduced 2009 [10] ), a design mimicking a cloud computing computer datacentre on a single chip with multiple independent cores: the prototype design included 48 cores per chip with hardware support for selective frequency and voltage control of cores to maximize energy efficiency, and mesh network for interchip messaging. The design lacked cache-coherent cores and focused on principles that would allow the design to scale to many more cores. [11]

The Teraflops Research Chip (prototype unveiled 2007 [12] ) is an experimental 80-core chip with two floating point units per core, implementing a 96-bit VLIW architecture instead of the x86 architecture. [13] The project investigated intercore communication methods, per-chip power management, and achieved 1.01 TFLOPS at 3.16 GHz consuming 62 W of power. [14] [15]

Knights Ferry

Intel’s MIC Prototype Board, named Knights Ferry , incorporating a codenamed processor Aubrey Isle was announced on May 31, 2010. The product was released from the Larrabee project and other Intel researches including the Single-chip Cloud Computer . [16] [17]

The development product was offered as a PCIe card with 32 in-core cores at up to 1.2 GHz with four threads per core, 2 GB GDDR5 memory, [18] and 8 MB coherent L2 cache (256 KB per core with 32 KB L1 cache ), and a power requirement of ~ 300 W, [18] built at a 45 nm process. [19] In the Aubrey Isle core has a 1.024-bit ring bus (512-bit bi-directional) connects processors to main memory. [20] Single board performance has exceeded 750 GFLOPS. [19]The prototype boards supporting only single precision floating point instructions. [21]

Initial developers included CERN , Korea Institute of Science and Technology Information (KISTI) and Leibniz Supercomputing Center . Hardware vendors for IBM, SGI, HP, Dell and others. [22]

Knights Corner

The Knights Corner product line is made at 22 nm process size, using Intel’s Tri-gate technology with more than 50 cores per chip, and is Intel’s first many-cores commercial product. [16] [19]

In June 2011, SGI announced a partnership with Intel to use the MIC architecture in its high performance computing products. [23] In September 2011, it was announced that the Texas Advanced Computing Center (TACC) will be using Knights Corner cards in their 10 petaFLOPS “Stampede” supercomputer, providing 8 petaFLOPS of compute power. [24] According to “Stampede: A Comprehensive Petascale Computing Environment” the “second-generation Intel (Knights Landing) MICs will be added when they become available, increasing their total output to at least 15 PetaFLOPS.” [25]

On November 15, 2011, Intel showed an early silicon version of a Knights Corner processor. [26] [27]

On June 5, 2012, Intel released open source software and documentation regarding Knights Corner. [28]

On June 18, 2012, Intel announced at the 2012 Hamburg International Supercomputing Conference that Xeon Phi will be the brand name for all of them based on their Many Integrated Core Architecture. [1] [29] [30] [31] [32] [33] [34] In June 2012, Cray announced it would be offering 22 nm ‘Knight’s Corner’ chips (branded as ‘Xeon Phi’) as a co- processor in its ‘Cascade’ systems. [35] [36]

In June 2012, ScaleMP announced it will provide its virtualization software to allow using ‘Knight’s Corner’ chips (branded as ‘Xeon Phi’) as main processor transparent extension. The virtualization software will allow ‘Knight’s Corner’ to run the legacy MMX / SSE code and access unlimited amount of (host) memory without need for code changes. [37] An important component of the Intel Xeon Phi coprocessor’s core is its vector processing unit (VPU). [38] The VPU features a novel 512-bit SIMD instruction set, which is well-known by many Intel Initial Many Core Instructions (Intel IMCI). Thus, the VPU can execute 16 single-precision (SP) or 8 double-precision(DP) operations per cycle. The VPU also supports Fused Multiply-Add (FMA) instructions and hence can execute 32 SP or 16 DP floating point operations per cycle. It also provides support for integers. The VPU also features an extended mathematical unit (EMU) that can execute such operations as reciprocal, square root, and logarithm, and can be used in a high-bandwidth fashion. The EMU operates by calculating polynomial approximations of these functions.

On November 12, 2012, Intel announced two Xeon Phi coprocessor families using the 22nm process size: the Xeon Phi 3100 and the Xeon Phi 5110P. [39] [40] [41] The Xeon Phi 3100 will be capable of more than 1 teraFLOPS of double precision floating point instructions with 240 GB / sec memory bandwidth at 300 W. [39] [40] [41] The Xeon Phi 5110P will be capable of 1.01 teraFLOPS of double precision floating point instructions with 320 GB / sec memory bandwidth at 225 W. [39] [40] [41] The Xeon Phi 7120P will be capable of 1.2 teraFLOPS of double precision floating point instructions with 352 GB / sec memory bandwidth at 300 W.

On June 17, 2013, the Tianhe-2 supercomputer was announced [3] by TOP500 as the world’s fastest. Tianhe-2 used Intel Ivy Xeon Bridge and Xeon Phi processors to achieve 33.86 petaFLOPS. It was the fastest on the list for two and a half years, lastly in November 2015. [42]

Design and programming

The cores of Knights Corner are based on a modified version of P54C design, used in the original Pentium.[43] The basis of the Intel MIC architecture is to leverage x86 legacy by creating an x86-compatible multiprocessor architecture that can use existing parallelization software tools.[19] Programming tools include OpenMP,[44] OpenCL,[45] Cilk/Cilk Plus and specialised versions of Intel’s Fortran, C++[46] and math libraries.[47]

ISBA, 4-way SMT per core, 512-bit SIMD units, 32 KB L1 cache instruction, 32 KB L1 data cache, coherent L2 cache (512 KB per core [48] ), and ultra-wide ring bus connecting processors and memory.

The Knights Corner instruction set documentation is available from Intel. [49] [50] [51]

Models
Xeon Phi
X100 Series
Desig-
nation
Cores
(Threads)
Clock (MHz) L2
Cache
Memory Peak DP
Compute
(GFLOPS)
TDP
(W)
Cooling
System
Form Factor released
Based Turbo System Chan
nels
BW
GB / s
Xeon Phi 3110X [52] SE3110X  61 (244) 1053 30.5 MB  6 GB
GDDR5 ECC
6x
Dual Channel
240 1028 300 Bare Board PCIe 2.0 x16 Card  ???
 8 GB 8x 320
Xeon Phi 3120A [53] SC3120A  57 (228) 1100 28.5 MB  6 GB 6x 240 1003 300 Fan / Heatsink June 17, 2013
Xeon Phi 3120P [54] SC3120P  57 (228) 1100 28.5 MB  6 GB 6x 240 1003 300 Passive Heatsink June 17, 2013
Xeon Phi 31S1P [55] BC31S1P  57 (228) 1100 28.5 MB  8 GB 8x 320 1003 270 Passive Heatsink June 17, 2013
Xeon Phi 5110P [56] SC5110P  60 (240) 1053 30.0 MB  8 GB 8x 320 1011 225 Passive Heatsink Nov 12, 2012
Xeon Phi 5120D [57] SC5120D  60 (240) 1053 30.0 MB  8 GB 8x 352 1011 245 Bare Board SFF 230-Pin Card June 17, 2013
BC5120D
Xeon Phi SE10P [58] SE10P  61 (244) 1100 30.5 MB  8 GB 8x 352 1074 300 Passive Heatsink PCIe 2.0 x16 Card Nov 12, 2012
Xeon Phi SE10X [59] SE10X  61 (244) 1100 30.5 MB  8 GB 8x 352 1074 300 Bare Board Nov 12, 2012
Xeon Phi 7110P [60] SC7110P  61 (244) 1250  ??? 30.5 MB 16 GB 8x 352 1220 300 Passive Heatsink ???
Xeon Phi 7110X [61] SC7110X  61 (244) 1250  ??? 30.5 MB 16 GB 8x 352 1220 300 Bare Board ???
Xeon Phi 7120A [62] SC7120A  61 (244) 1238 1333 30.5 MB 16 GB 8x 352 1208 300 Fan / Heatsink April 6, 2014
Xeon Phi 7120D [63] SC7120D  61 (244) 1238 1333 30.5 MB 16 GB 8x 352 1208 270 Bare Board SFF 230-Pin Card March ??, 2014
Xeon Phi 7120P [64] SC7120P  61 (244) 1238 1333 30.5 MB 16 GB 8x 352 1208 300 Passive Heatsink PCIe 2.0 x16 Card June 17, 2013
Xeon Phi 7120X [65] SC7120X  61 (244) 1238 1333 30.5 MB 16 GB 8x 352 1208 300 Bare Board June 17, 2013

Knights Landing

Code name for the second generation MIC architecture product from Intel. [25] Intel officially announced its second generation Intel Xeon Phi products on June 17, 2013. [5] Intel said that the next generation of Intel’s MIC architecture-based products will be available in two forms, as a coprocessor or host processor (CPU), and is manufactured using Intel’s 14 nm process technology. Knights Landing products will include integrated on-package memory for significantly higher memory bandwidth.

Knights Landing contains up to 72 Airmont (Atom) cores with four threads per core, [66] [67] using LGA 3647 socket [68] supporting up to 384 GB of “far” DDR4 2133 RAM and 8-16 GB of stacked ” near “3D MCDRAM , a version of the Hybrid Memory Cube . Each of the 512-bit AVX-512 SIMD instructions, specifically the Intel AVX-512 Foundational Instructions (AVX-512F) with Intel AVX-512 Conflict Detection Instructions (AVX-512CD), Intel AVX-512 Exponential Reciprocal Instructions (AVX-512ER) and Intel AVX-512 Prefetch Instructions (AVX-512PF). [69]

The National Energy Research Scientific Computing Center announced that Phase 2 of its newest supercomputing system “Cori” would use Knights Landing Xeon Phi coprocessors. [70]

On June 20, 2016, Intel launched the Intel Xeon Phi product family based on the Knights Landing architecture, stressing its applicability to not just traditional simulation workloads, but also to machine learning . [71] [72] Xeon Phi of bootable form-factor, but two versions of it: standard processors and integrated Intel Omni-Path architecture fabric. [73] The latter is denoted by the suffix F in the model number. Undefined high-performance network cards. [71]

On November 14, 2016, the 48th list of TOP500 contained 10 systems using Knights Landing platforms. quote needed ]

The PCIe based co-processor variant of Knight’s Landing was never offered to the general market and was discontinued by August 2017. [74] This included the 7220A, 7240P and 7220P coprocessor cards.

Models

All models can boost their peak speeds, adding 200MHz to their base rate when running just one or two cores. When running from 3 to the maximum number of cores, the chips can only boost 100 MHz above the base frequency. All chips run high-AVX code at a frequency reduced by 200 MHz. [75]

Xeon Phi
7200 Series
sSpec
Number
Cores
(Threads)
Clock (MHz) L2
Cache
MCDRAM Memory DDR4 Memory Peak DP
Compute
TDP
(W)
Soc-
ket
Release
Date
Part Number
Based Turbo Inventory BW Inventory BW
Xeon Phi 7210 [76] SR2ME (B0) 64 (256) 1300 1500 32 MB 16 GB 400+ GB / s 384 GB 102.4 GB / s 2662
GFLOPS
215
SVLCLGA3647
June 20,
2016
HJ8066702859300
SR2X4 (B0)
Xeon Phi 7210F [77] SR2X5 (B0) 230 HJ8066702975000
Xeon Phi 7230 [78] SR2MF (B0) 215 HJ8066702859400
SR2X3 (B0)
Xeon Phi 7230F [79] SR2X2 (B0) 230 HJ8066702269002
Xeon Phi 7250 [80] SR2MD (B0) 68 (272) 1400 1600 34 MB 3046
GFLOPS [81]
215 HJ8066702859200
SR2X1 (B0)
Xeon Phi 7250F [82] SR2X0 (B0) 230 HJ8066702268900
Xeon Phi 7290 [83] SR2WY (B0) 72 (288) 1500 1700 36 MB 3456
GFLOPS
245 HJ8066702974700
Xeon Phi 7290F [84] SR2WZ (B0) 260 HJ8066702975200

Knights Hill

Knights Hill was the codename for the third-generation MIC architecture, for which Intel announced the first details at SC14 [85] . It was manufactured in a 10 nm process. [86]

Knights Hill was expected to be used in the United States Department of Energy Aurora supercomputer, to be deployed at Argonne National Laboratory [87] [88] . However, Aurora was delayed in favor of using “advanced architecture” with a focus on machine learning. [89] [90]

Intel has announced in November 2017 Exascale computing in the future. This new architecture is now expected for 2020-2021. [91] [92]

Knights Mill

Knights Mill is Intel’s codename for a Xeon Phi product specialized in deep learning , [93] first released in December 2017. [94] Nearly identical in specifications to Knights Landing, Knights Mill includes optimizations for better use of AVX-512 instructions and enables 4 -way hyperthreading . Single-precision and variable-precision floating-point performance, at the expense of double-precision floating-point performance.

Models
Xeon Phi
7200 Series
sSpec
Number
Cores
(Threads)
Clock (MHz) L2
Cache
MCDRAM Memory DDR4 Memory Peak DP
Compute
TDP
(W)
Soc-
ket
Release
Date
Part Number
Based Turbo Inventory BW Inventory BW
Xeon Phi 7235 TBA 64 (256) 1300 1400 32 MB 16 GB 400+ GB / s 384 GB 102.4 GB / s TBA 250
SVLCLGA3647
Q4 2017 TBA
Xeon Phi 7285 TBA 68 (272) 1300 1400 34 MB 115.2 GB / s TBA 250 TBA
Xeon Phi 7295 TBA 72 (288) 1500 1600 36 MB 115.2 GB / s TBA 320 TBA

Programming

An empirical performance and programmability study has been carried out by researchers, [95] in which the authors claim that achieving high performance with Xeon Phi still needs help from programmers and that is simply relying on compilers with traditional programming is still far from reality. However, research in various domains, such as life sciences, [96] deep learning [97] and computer-aided engineering [98] demonstrated that exploiting both the thread- and SIMD-parallelism of Xeon Phi achieves significant speed-ups.

Competitors

  • Nvidia Tesla , a direct competitor in the HPC market [99]
  • AMD Radeon Pro and AMD Radeon Instinct direct competitors in the HPC market

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright computerforum.eu 2018
Shale theme by Siteturner