Streaming SIMD Extensions - Misplaced Pages

In computing , Streaming SIMD Extensions ( SSE ) is a single instruction, multiple data ( SIMD ) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in its Pentium III series of central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow! . SSE contains 70 new instructions (65 unique mnemonics using 70 encodings), most of which work on single precision floating-point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing .

#530469

28-564: Intel's first IA-32 SIMD effort was the MMX instruction set. MMX had two main problems: it re-used existing x87 floating-point registers making the CPUs unable to work on both floating-point and SIMD data at the same time, and it only worked on integers . SSE floating-point instructions operate on a new independent register set, the XMM registers, and adds a few integer instructions that work on MMX registers. SSE

56-567: A compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock cycle . This limitation reduces the effectiveness of pipelining , but the separate XMM registers do allow SIMD and scalar floating-point operations to be mixed without the performance hit from explicit MMX/floating-point mode switching. SSE introduced both scalar and packed floating-point instructions. The following simple example demonstrates

84-459: A processor , memory , and other major system components that operate on data in 32- bit units. Compared to smaller bit widths, 32-bit computers can perform large calculations more efficiently and process more data per clock cycle. Typical 32-bit personal computers also have a 32-bit address bus , permitting up to 4 GB of RAM to be accessed, far more than previous generations of system architecture allowed. 32-bit designs have been used since

112-413: A total of 96 bits per pixel. 32-bit-per-channel images are used to represent values brighter than what sRGB color space allows (brighter than white); these values can then be used to more accurately retain bright highlights when either lowering the exposure of the image or when it is seen through a dark filter or dull reflection. For example, a reflection in an oil slick is only a fraction of that seen in

140-402: Is a 32-bit machine, with 32-bit registers and instructions that manipulate 32-bit quantities, but the external address bus is 36 bits wide, giving a larger address space than 4 GB, and the external data bus is 64 bits wide, primarily in order to permit a more efficient prefetch of instructions and data. Prominent 32-bit instruction set architectures used in general-purpose computing include

168-532: Is duplicated in the Intel 64 architecture. There is also a new 32-bit control/status register, MXCSR . The registers XMM8 through XMM15 are accessible only in 64-bit operating mode. SSE used only a single data type for XMM registers: SSE2 would later expand the usage of the XMM registers to include: Because these 128-bit registers are additional machine states that the operating system must preserve across task switches , they are disabled by default until

196-528: Is the availability of 32-bit general-purpose processor registers (for example, EAX and EBX), 32-bit integer arithmetic and logical operations, 32-bit offsets within a segment in protected mode , and the translation of segmented addresses to 32-bit linear addresses. The designers took the opportunity to make other improvements as well. Some of the most significant changes (relative to the 16-bit 286 instruction set) are: 32-bit In computer architecture , 32-bit computing refers to computer systems with

224-475: Is the first incarnation of x86 that supports 32-bit computing; as a result, the "IA-32" term may be used as a metonym to refer to all x86 versions that support 32-bit computing. Within various programming language directives, IA-32 is still sometimes referred to as the "i386" architecture. In some other contexts, certain iterations of the IA-32 ISA are sometimes labelled i486 , i586 and i686 , referring to

252-606: The 8088/8086 or 80286 , 16-bit microprocessors with a segmented address space where programs had to switch between segments to reach more than 64 kilobytes of code or data. As this is quite time-consuming in comparison to other machine operations, the performance may suffer. Furthermore, programming with segments tend to become complicated; special far and near keywords or memory models had to be used (with care), not only in assembly language but also in high level languages such as Pascal , compiled BASIC , Fortran , C , etc. The 80386 and its successors fully support

280-793: The IBM System/360 , IBM System/370 (which had 24-bit addressing), System/370-XA , ESA/370 , and ESA/390 (which had 31-bit addressing), the DEC VAX , the NS320xx , the Motorola 68000 family (the first two models of which had 24-bit addressing), the Intel IA-32 32-bit version of the x86 architecture, and the 32-bit versions of the ARM , SPARC , MIPS , PowerPC and PA-RISC architectures. 32-bit instruction set architectures used for embedded computing include

308-536: The IBM System/360 Model 30 had an 8-bit ALU, 8-bit internal data paths, and an 8-bit path to memory, and the original Motorola 68000 had a 16-bit data ALU and a 16-bit external data bus, but had 32-bit registers and a 32-bit oriented instruction set. The 68000 design was sometimes referred to as 16/32-bit . However, the opposite is often true for newer 32-bit designs. For example, the Pentium Pro processor

SECTION 10

#1732851536531

336-448: The integer representation used. With the two most common representations, the range is 0 through 4,294,967,295 (2 − 1) for representation as an ( unsigned ) binary number , and −2,147,483,648 (−2 ) through 2,147,483,647 (2 − 1) for representation as two's complement . One important consequence is that a processor with 32-bit memory addresses can directly access at most 4 GiB of byte-addressable memory (though in practice

364-533: The 16-bit segments of the 80286 but also segments for 32-bit address offsets (using the new 32-bit width of the main registers). If the base address of all 32-bit segments is set to 0, and segment registers are not used explicitly, the segmentation can be forgotten and the processor appears as having a simple linear 32-bit address space. Operating systems like Windows or OS/2 provide the possibility to run 16-bit (segmented) programs as well as 32-bit programs. The former possibility exists for backward compatibility and

392-423: The 68000 family and ColdFire , x86, ARM, MIPS, PowerPC, and Infineon TriCore architectures. On the x86 architecture , a 32-bit application normally means software that typically (not necessarily) uses the 32-bit linear address space (or flat memory model ) possible with the 80386 and later chips. In this context, the term came about because DOS , Microsoft Windows and OS/2 were originally written for

420-432: The advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, four-component vectors together using x86 requires four floating-point addition instructions. This corresponds to four x86 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128-bit 'packed-add' instruction can replace

448-462: The code name for the first Pentium III core revision. During the Katmai project Intel sought to distinguish it from its earlier product line, particularly its flagship Pentium II . It was later renamed Internet Streaming SIMD Extensions ( ISSE ), then SSE. AMD added a subset of SSE, 19 of them, called new MMX instructions, and known as several variants and combinations of SSE and MMX, shortly after with

476-587: The contemporary prevalence of x86-64, as of today, IA-32 protected mode versions of many modern operating systems are still maintained, e.g. Microsoft Windows (until Windows 10 ), Windows Server (until Windows Server 2008 ) and the Debian Linux distribution. In spite of IA-32's name (and causing some potential confusion), the 64-bit evolution of x86 that originated out of AMD would not be known as "IA-64", that name instead belonging to Intel's Itanium architecture . The primary defining characteristic of IA-32

504-485: The earliest days of electronic computing, in experimental systems and then in large mainframe and minicomputer systems. The first hybrid 16/32-bit microprocessor , the Motorola 68000 , was introduced in the late 1970s and used in systems such as the original Apple Macintosh . Fully 32-bit microprocessors such as the HP FOCUS , Motorola 68020 and Intel 80386 were launched in the early to mid 1980s and became dominant by

532-712: The early 1990s. This generation of personal computers coincided with and enabled the first mass-adoption of the World Wide Web . While 32-bit architectures are still widely-used in specific applications, the PC and server market has moved on to 64 bits with x86-64 and other 64-bit architectures since the mid-2000s with installed memory often exceeding the 32-bit 4G RAM address limits on entry level computers. The latest generation of smartphones have also switched to 64 bits. A 32-bit register can store 2 different values. The range of integer values that can be stored in 32 bits depends on

560-574: The first decades of 32-bit architectures (the 1960s to the 1980s). Older 32-bit processor families (or simpler, cheaper variants thereof) could therefore have many compromises and limitations in order to cut costs. This could be a 16-bit ALU , for instance, or external (or internal) buses narrower than 32 bits, limiting memory size or demanding more cycles for instruction fetch, execution or write back. Despite this, such processors could be labeled 32-bit , since they still had 32-bit registers and instructions able to manipulate 32-bit quantities. For example,

588-407: The four scalar addition instructions. The following programs can be used to determine which, if any, versions of SSE are supported on a system IA-32 IA-32 (short for " Intel Architecture, 32-bit ", commonly called i386 ) is the 32-bit version of the x86 instruction set architecture , designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32

SECTION 20

#1732851536531

616-582: The instruction supersets offered by the 80486 , the P5 and the P6 microarchitectures respectively. These updates offered numerous additions alongside the base IA-32 set including floating-point capabilities and the MMX extensions . Intel was historically the largest manufacturer of IA-32 processors, with the second biggest supplier having been AMD . During the 1990s, VIA , Transmeta and other chip manufacturers also produced IA-32 compatible processors (e.g. WinChip ). In

644-412: The latter is usually meant to be used for new software development . In digital images/pictures, 32-bit usually refers to RGBA color space ; that is, 24-bit truecolor images with an additional 8-bit alpha channel . Other image formats also specify 32 bits per pixel, such as RGBE . In digital images, 32-bit sometimes refers to high-dynamic-range imaging (HDR) formats that use 32 bits per channel,

672-461: The limit may be lower). The world's first stored-program electronic computer , the Manchester Baby , used a 32-bit architecture in 1948, although it was only a proof of concept and had little practical capacity. It held only 32 32-bit words of RAM on a Williams tube , and had no addition operation, only subtraction. Memory, as well as other digital circuits and wiring, was expensive during

700-460: The modern era, Intel still produced IA-32 processors under the Intel Quark microcontroller platform until 2019; however, since the 2000s, the majority of manufacturers (Intel included) moved almost exclusively to implementing CPUs based on the 64-bit variant of x86, x86-64 . x86-64, by specification, offers legacy operating modes that operate on the IA-32 ISA for backwards compatibility. Even given

728-580: The operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions that can save all x86 and SSE register states at once. This support was quickly added to all major IA-32 operating systems. The first CPU to support SSE, the Pentium III , shared execution resources between SSE and the floating-point unit (FPU). While

756-479: The release of the original Athlon in August 1999, see 3DNow! extensions . AMD eventually added full support for SSE instructions, starting with its Athlon XP and Duron ( Morgan core ) processors. SSE originally added eight new 128-bit registers known as XMM0 through XMM7 . The AMD64 extensions from AMD (originally called x86-64 ) added a further eight registers XMM8 through XMM15 , and this extension

784-485: Was subsequently expanded by Intel to SSE2 , SSE3 , SSSE3 and SSE4 . Because it supports floating-point math, it had wider applications than MMX and became more popular. The addition of integer support in SSE2 made MMX largely redundant, though further performance increases can be attained in some situations by using MMX in parallel with SSE operations. SSE was originally called Katmai New Instructions ( KNI ), Katmai being

#530469