POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA , announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation , which is the first time for such availability of IBM's highest-end processors.
21-605: Systems based on POWER8 became available from IBM in June 2014. Systems and POWER8 processor designs made by other OpenPOWER members were available in early 2015. POWER8 is designed to be a massively multithreaded chip, with each of its cores capable of handling eight hardware threads simultaneously, for a total of 96 threads executed simultaneously on a 12-core chip. The processor makes use of very large amounts of on- and off-chip eDRAM caches, and on-chip memory controllers enable very high bandwidth to memory and system I/O. For most workloads,
42-590: A 64-byte wide bus (which is twice as wide as on its predecessor), and 8 MB of L3 eDRAM cache per chiplet shareable among all chiplets. Thus, a six-chiplet processor would have 48 MB of L3 eDRAM cache, while a 12-chiplet processor would have a total of 96 MB of L3 eDRAM cache. The chip can also utilize an up to 128 MB of off-chip eDRAM L4 cache using Centaur companion chips. The on-chip memory controllers can handle 1 TB of RAM and 230 GB/s sustained memory bandwidth. The on-board PCI Express controllers can handle 48 GB/s of I/O to other parts of
63-649: A second CPU socket are now provided via the X Bus . Besides that and a slight size increase to 659 mm, the differences seem minimal compared to previous POWER8 processors. On 19 January 2014, the Suzhou PowerCore Technology Company announced that they will join the OpenPOWER Foundation and license the POWER8 core to design custom-made processors for use in big data and cloud computing applications. EDRAM Embedded DRAM ( eDRAM )
84-469: A so-called on-chip controller (OCC), which is a power and thermal management microcontroller based on a PowerPC 405 processor. It has two general-purpose offload engines (GPEs) and 512 KB of embedded static RAM (SRAM) (1 KB = 1024 bytes), together with the possibility to access the main memory directly, while running an open-source firmware . OCC manages POWER8's operating frequency, voltage, memory bandwidth, and thermal control for both
105-456: A time. It runs at 8 GB /s in the early Entry models, later increased in the high-end and the HPC models to 9.6 GB/s with a 40-ns latency, for a sustained bandwidth of 24 GB/s and 28.8 GB/s per channel respectively. Each processor has two memory controllers with four memory channels each, and the maximum processor to memory buffer bandwidth is 230.4 GB/s per processor. Depending on
126-450: Is dynamic random-access memory (DRAM) integrated on the same die or multi-chip module (MCM) of an application-specific integrated circuit (ASIC) or microprocessor . eDRAM's cost-per-bit is higher when compared to equivalent standalone DRAM chips used as external memory, but the performance advantages of placing eDRAM onto the same chip as the processor outweigh the cost disadvantages in many applications. In performance and size, eDRAM
147-491: Is called Turismo and the dual-chip variant is called Murano. PowerCore's modified version is called CP1. This is a revised version of the original 12-core POWER8 from IBM, and used to be called POWER8+ . The main new feature is that it has support for Nvidia's bus technology NVLink , connecting up to four NVLink devices directly to the chip. IBM removed the A Bus and PCI interfaces for SMP connections to other POWER8 sockets and replaced them with NVLink interfaces. Connection to
168-554: Is eight-way hardware multithreaded and can be dynamically and automatically partitioned to have either one, two, four or all eight threads active. POWER8 also added support for hardware transactional memory . IBM estimates that each core is 1.6 times as fast as the POWER7 in single-threaded operations. A POWER8 processor is a 6- or 12-chiplet design with variants of either 4, 6, 8, 10 or 12 activated chiplets, in which one chiplet consists of one processing core, 512 KB of SRAM L2 cache on
189-835: Is embedded along with the eDRAM memory, the remainder of the ASIC can treat the memory like a simple SRAM type such as in 1T-SRAM . eDRAM is used in various products, including IBM 's POWER7 processor, and IBM's z15 mainframe processor (mainframes built which use up to 4.69 GB of eDRAM when 5 such add-on chips/drawers are used but all other levels from L1 up also use eDRAM, for a total of 6.4 GB of eDRAM). Intel 's Haswell CPUs with GT3e integrated graphics, many game consoles and other devices, such as Sony 's PlayStation 2 , Sony's PlayStation Portable , Nintendo 's GameCube , Nintendo's Wii , Nintendo's Wii U , and Microsoft's Xbox 360 also use eDRAM. High Bandwidth Memory DDR3 Too Many Requests If you report this error to
210-441: Is positioned between level 3 cache and conventional DRAM on the memory bus, and effectively functions as a level 4 cache, though architectural descriptions may not explicitly refer to it in those terms. Embedding memory on the ASIC or processor allows for much wider buses and higher operation speeds, and due to much higher density of DRAM in comparison to SRAM , larger amounts of memory can be installed on smaller chips if eDRAM
231-400: Is used instead of eSRAM . eDRAM requires additional fab process steps compared with embedded SRAM, which raises cost, but the 3× area savings of eDRAM memory offsets the process cost when a significant amount of memory is used in the design. eDRAM memories, like all DRAM memories, require periodic refreshing of the memory cells, which adds complexity. However, if the memory refresh controller
SECTION 10
#1732869042075252-544: The GX++ bus for external communication, POWER8 removes this from the design and replaces it with the CAPI port (Coherent Accelerator Processor Interface) that is layered on top of PCI Express 3.0 . The CAPI port is used to connect auxiliary specialized processors such as GPUs , ASICs and FPGAs . Units attached to the CAPI bus can use the same memory address space as the CPU, thereby reducing
273-521: The Memory Buffer chips and the DRAM banks. Initially support was limited to 16 GB, 32 GB and 64 GB DIMMs, allowing up to 1 TB to be addressed by the processor. Later support for 128 GB and 256 GB DIMMs was announced, allowing up to 4 TB per processor. The POWER8 core has 64 KB L1 data cache contained in the load-store unit and 32 KB L1 instruction cache contained in
294-602: The chip is said to perform two to three times as fast as its predecessor, the POWER7 . POWER8 chips comes in 6- or 12-core variants; each version is fabricated in a 22 nm silicon on insulator (SOI) process using 15 metal layers. The 12-core version consists of 4.2 billion transistors and is 650 mm large while the 6-core version is only 362 mm large. However the 6- and 12-core variants can have all or just some cores active, so POWER8 processors come with 4, 6, 8, 10 or 12 cores activated. Where previous POWER processors use
315-537: The computing path length. At the 2013 ACM/IEEE Supercomputing Conference , IBM and Nvidia announced an engineering partnership to closely couple POWER8 with Nvidia GPUs in future HPC systems, with the first of them announced as the Power Systems S824L. On October 14, 2016, IBM announced the formation of OpenCAPI , a new organization to spread adoption of CAPI to other platforms. Initial members are Google, AMD, Xilinx, Micron and Mellanox. POWER8 also contains
336-517: The instruction fetch unit, along with a tightly integrated 512 KB L2 cache. In a single cycle each core can fetch up to eight instructions, decode and dispatch up to eight instructions, issue and execute up to ten instructions and commit up to eight instructions. Each POWER8 core consist of primarily the following six execution units : Each core has sixteen execution pipelines: It has a larger issue queue with 4×16 entries, improved branch predictors and can handle twice as many cache misses. Each core
357-440: The model only one controller might be enabled, or only two channels per controller could be in use. For increased availability the link provides "on-the-fly" lane isolation and repair. Each Memory Buffer chip has four interfaces allowing to use either DDR3 or DDR4 memory at 1600 MHz with no change to the processor link interface. The resulting 32 memory channels per processor allow peak access rate of 409.6 GB/s between
378-564: The processor and closer to the memory. The scheduling logic, the memory energy management, and the RAS decision point are moved to a so-called Memory Buffer chip (a.k.a. Centaur ). Offloading certain memory processes to the Memory Buffer chip enables memory access optimizations, saving bandwidth and allowing for faster processor to memory communication. It also contains caching structures for an additional 16 MB of L4 cache per chip (up to 128 MB per processor) (1 MB = 1024 KB). Depending on
399-521: The processor and memory; it can regulate voltages through 1,764 integrated voltage regulators (IVRs) on the fly. Also, the OCC can be programmed to overclock the POWER8 processor, or to lower its power consumption by reducing the operating frequency (which is similar to the configurable TDP found in some of the Intel and AMD processors). POWER8 splits the memory controller functions by moving some of them away from
420-521: The system architecture the Memory Buffer chips are placed either on the memory modules (Custom DIMM/CDIMM, for example in S824 and E880 models), or on the memory riser card holding standard DIMMs (for example in S822LC models). The Memory Buffer chip is connected to the processor using a high-speed multi-lane serial link. The memory channel connecting each buffer chip is capable of writing 2 bytes and reading 1 byte at
441-461: The system. The cores are designed to operate at clock rates between 2.5 and 5 GHz. The six-core chips are mounted in pairs on dual-chip modules (DCM) in IBM's scale out servers . In most configurations not all cores are active, resulting in a variety of configurations where the actual core count differs. The 12-core version is used in the high-end E880 and E880C models. IBM's single-chip POWER8 module
SECTION 20
#1732869042075#74925