Heterogeneous System Architecture

Heterogeneous System Architecture ( HSA ) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks . The HSA is being developed by the HSA Foundation , which includes (among many others) AMD and ARM . The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices , and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must currently be done with OpenCL or CUDA ).

#665334

30-431: CUDA and OpenCL as well as most other fairly advanced programming languages can use HSA to increase their execution performance. Heterogeneous computing is widely used in system-on-chip devices such as tablets , smartphones , other mobile devices, and video game consoles . HSA allows programs to use the graphics processor for floating point calculations without separate memory or scheduling. The rationale behind HSA

60-521: A graphics processor , to operate at the same processing level as the system's CPU. Among its main features, HSA defines a unified virtual address space for compute devices: where GPUs traditionally have their own memory, separate from the main (CPU) memory, HSA requires these devices to share page tables so that devices can exchange data by sharing pointers . This is to be supported by custom memory management units . To render interoperability possible and also to ease various aspects of programming, HSA

90-413: A system-on-chip , or SoC. For example, many new processors now include built-in logic for interfacing with other devices ( SATA , PCI , Ethernet , USB , RFID , radios , UARTs , and memory controllers ), as well as programmable functional units and hardware accelerators ( GPUs , cryptography co-processors , programmable network processors, A/V encoders/decoders, etc.). Recent findings show that

120-488: A "big" or P-core and a more power efficient core usually known as a "small" or E-core. The terms P- and E-cores are usually used in relation to Intel's implementation of hetereogeneous computing, while the terms big and little cores are usually used in relation to the ARM architecture. Some processors have three categories of core, prime, performance and efficiency cores, with prime cores having higher performance than performance cores;

150-578: A couple of other software tools related to HSA. CodeXL version 2.0 includes an HSA profiler. As of February 2015, only AMD's "Kaveri" A-series APUs (cf. "Kaveri" desktop processors and "Kaveri" mobile processors ) and Sony's PlayStation 4 allowed the integrated GPU to access memory via version 2 of the AMD's IOMMU. Earlier APUs (Trinity and Richland) included the version 2 IOMMU functionality, but only for use by an external GPU connected via PCI Express. Post-2015 Carrizo and Bristol Ridge APUs also include

180-425: A different microarchitecture ( floating point number processing is a special case of this - not usually referred to as heterogeneous). In the past heterogeneous computing meant different ISAs had to be handled differently, while in a modern example, Heterogeneous System Architecture (HSA) systems eliminate the difference (for the user) while using multiple processor types (typically CPUs and GPUs ), usually on

210-464: A heterogeneous-ISA chip multiprocessor that exploits diversity offered by multiple ISAs can outperform the best same-ISA homogeneous architecture by as much as 21% with 23% energy savings and a reduction of 32% in Energy Delay Product (EDP). AMD's 2014 announcement on its pin-compatible ARM and x86 SoCs, codename Project Skybridge, suggested a heterogeneous-ISA (ARM+x86) chip multiprocessor in

240-446: A prime core is known as "big", a performance core is known as "medium", and an efficiency core is known as "small". A common use of such topology is to provide better power efficiency, especially in mobile SoCs. Heterogeneous computing systems present new challenges not found in typical homogeneous systems. The presence of multiple processing elements raises all of the issues involved with homogeneous parallel processing systems, while

270-536: A technique that creates new frames in between existing ones by using motion interpolation . Launching in September 2023, FSR 3 uses a combination of FSR 2 and optical flow analysis, which runs using asynchronous compute (as opposed to Nvidia's DLSS 3 which uses dedicated hardware). Because FSR 3 uses a software-based solution, it is compatible with GPUs from AMD, Nvidia, and Intel as well as the ninth generation of video game consoles . To combat additional latency inherent to

300-457: Is a middleware software suite originally developed by AMD 's Radeon Technologies Group that offers advanced visual effects for computer games. It was released in 2016. GPUOpen serves as an alternative to, and a direct competitor of Nvidia GameWorks . GPUOpen is similar to GameWorks in that it encompasses several different graphics technologies as its main components that were previously independent and separate from one another. However, GPUOpen

330-534: Is intended to be ISA -agnostic for both CPUs and accelerators, and to support high-level programming languages. So far, the HSA specifications cover: HSAIL (Heterogeneous System Architecture Intermediate Language), a virtual instruction set for parallel programs Mobile devices are one of the HSA's application areas, in which it yields improved power efficiency. The illustrations below compare CPU-GPU coordination under HSA versus under traditional architectures. Some of

SECTION 10

#1732855602666

360-412: Is made up of several main components, tools, and SDKs. Software for computer-generated imagery (CGI) used in development of computer games and movies alike. FidelityFX Super Resolution ( FSR ) is used to upsample an input image into a higher resolution. There are multiple versions of FSR with distinctive upscaling technique and image quality: The standard presets for FSR by AMD can be found in

390-630: Is not compatible with VSYNC. The official AMD directory lists: Having been released by ATI Technologies under the BSD license in 2006 HLSL2GLSL is not part of GPUOpen. Whether similar tools for SPIR-V will be available remains to be seen, as is the official release of the Vulkan (API) itself. Source-code that has been defined as being part of GPUOpen is also part of the Linux kernel (e.g. amdgpu and amdkfd ), Mesa 3D and LLVM. As of 2022, AMD compute software ecosystem

420-513: Is partially open source software , unlike GameWorks which is proprietary and closed. GPUOpen was announced on December 15, 2015, and released on January 26, 2016. Nicolas Thibieroz, AMD's Senior Manager of Worldwide Gaming Engineering, argues that "it can be difficult for developers to leverage their R&D investment on both consoles and PC because of the disparity between the two platforms" and that "proprietary libraries or tools chains with " black box " APIs prevent developers from accessing

450-677: Is regrouped under the ROCm metaproject. Software around Heterogeneous System Architecture (HSA), General-Purpose computing on Graphics Processing Units (GPGPU) and High-Performance Computing (HPC) AMD's "Boltzmann Initiative" (named after Ludwig Boltzmann ) was announced in November 2015 at the SuperComputing15 and productized as the Radeon Open Compute platform (ROCm). It aims to provide an alternative to Nvidia's CUDA which includes

480-594: Is to ease the burden on programmers when offloading calculations to the GPU. Originally driven solely by AMD and called the FSA, the idea was extended to encompass processing units other than GPUs, such as other manufacturers' DSPs , as well. Modern GPUs are very well suited to perform single instruction, multiple data (SIMD) and single instruction, multiple threads (SIMT), while modern CPUs are still being optimized for branching. etc. Originally introduced by embedded systems such as

510-541: The Cell Broadband Engine , sharing system memory directly between multiple system actors makes heterogeneous computing more mainstream. Heterogeneous computing itself refers to systems that contain multiple processing units – central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), or any type of application-specific integrated circuits (ASICs). The system architecture allows any accelerator, for instance

540-419: The AMD's IOMMU , was accepted into the Linux kernel mainline version 4.14. Integrated support for HSA platforms has been announced for the "Sumatra" release of OpenJDK , due in 2015. AMD APP SDK is AMD's proprietary software development kit targeting parallel computing , available for Microsoft Windows and Linux. Bolt is a C++ template library optimized for heterogeneous computing. GPUOpen comprehends

570-477: The HSA runtime. This very first implementation, known as amdkfd , focuses on "Kaveri" or "Berlin" APUs and works alongside the existing Radeon kernel graphics driver. Additionally, amdkfd supports heterogeneous queuing (HQ), which aims to simplify the distribution of computational jobs among multiple CPUs and GPUs from the programmer's perspective. Support for heterogeneous memory management ( HMM ), suited only for graphics hardware featuring version 2 of

600-532: The HSA-specific features implemented in the hardware need to be supported by the operating system kernel and specific device drivers. For example, support for AMD Radeon and AMD FirePro graphics cards, and APUs based on Graphics Core Next (GCN), was merged into version 3.19 of the Linux kernel mainline , released on 8 February 2015. Programs do not interact directly with amdkfd , but queue their jobs utilizing

630-606: The MIT License. GPUOpen also makes it easy for developers to get low-level GPU access. Additionally AMD wants to grant interested developers the kind of low-level "direct access" to their GCN -based GPUs, that surpasses the possibilities of Direct3D 12 or Vulkan . AMD mentioned e.g. a low-level access to the Asynchronous Compute Engines (ACEs). The ACE implement "Asynchronous Compute", but they cannot be freely configured under either Vulkan or Direct3D 12. GPUOpen

SECTION 20

#1732855602666

660-657: The code for maintenance, porting or optimizations purposes". He says that upcoming architectures, such as AMD's RX 400 series "include many features not exposed today in PC graphics APIs". AMD designed GPUOpen to be a competing open-source middleware stack released under the MIT License . The libraries are intended to increase software portability between video game consoles , PCs and also high-performance computing . GPUOpen unifies many of AMD's previously separate tools and solutions into one package, also fully open-sourcing them under

690-510: The frame generation process, AMD has a driver-level feature called Anti-Lag, which only runs on AMD GPUs. AMD Fluid Motion Frames (AFMF) is a driver-level frame generation technology launching in Q1 2024 which is compatible with all DirectX 11 and DirectX 12 games, however it runs on RDNA 2 and RDNA 3 GPUs. AFMF uses optical flow analysis but not motion vectors, so it can only interpolate a new frame between two traditionally rendered frames. AFMF currently

720-434: The level of heterogeneity in the system can introduce non-uniformity in system development, programming practices, and overall system capability. Areas of heterogeneity can include: Heterogeneous computing hardware can be found in every domain of computing—from high-end servers and high-performance computing machines all the way down to low-power embedded devices including mobile phones and tablets. GPUOpen GPUOpen

750-410: The making. A system with heterogeneous CPU topology is a system where the same ISA is used, but the cores themselves are different in speed. The setup is more similar to a symmetric multiprocessor . (Although such systems are technically asymmetric multiprocessors , the cores do not differ in roles or device access.) There are typically two types of cores: a higher performance core usually known as

780-518: The same integrated circuit , to provide the best of both worlds: general GPU processing (apart from the GPU's well-known 3D graphics rendering capabilities, it can also perform mathematically intensive computations on very large data-sets), while CPUs can run the operating system and perform traditional serial tasks. The level of heterogeneity in modern computing systems is gradually increasing as further scaling of fabrication technologies allows for formerly discrete components to become integrated parts of

810-409: The same type of processors, but by adding dissimilar coprocessors , usually incorporating specialized processing capabilities to handle particular tasks. Usually heterogeneity in the context of computing refers to different instruction-set architectures (ISA), where the main processor has one and other processors have another - usually a very different - architecture (maybe more than one), not just

840-446: The table below. Note that these presets are not the only way in which the algorithm can be used, they are simply presets for input/output resolutions. Certain titles, such as Dota 2 have offered resolution sliders to fine tune the scaling percentage or dynamically scaling the internal render resolution depending on the FPS cap. AMD has also created a command line interface tool which allows

870-570: The user to upscale any image using FSR1/EASU as in addition to other upsampling methods such as Bilinear Interpolation . It also allows the user to run various stages of the FSR pipeline, such as RCAS independently. FSR 2 can also be modded into nearly any game supporting DLSS by swapping the DLSS DLL with a translation layer DLL that maps the DLSS API calls to FSR 2 API calls. FSR 3 adds frame generation,

900-695: The version 2 IOMMU functionality for the integrated GPU. The following table shows features of AMD 's processors with 3D graphics, including APUs (see also: List of AMD processors with 3D graphics ). ARM's Bifrost microarchitecture, as implemented in the Mali-G71, is fully compliant with the HSA 1.1 hardware specifications. As of June 2016, ARM has not announced software support that would use this hardware feature. Heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or core . These systems gain performance or energy efficiency not just by adding

#665334