Misplaced Pages

GEMM

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
#652347

81-538: GEMM may refer to: General matrix multiply gemm , one of the Basic Linear Algebra Subprograms Genetically engineered mouse model Gilt-edged market maker Global Electronic Music Marketplace , a former online music market CFU-GEMM , granulocyte-erythrocyte-monocyte-megakaryocyte colony forming unit See also [ edit ] Gem (disambiguation) Topics referred to by

162-529: A 64-bit extension to the x86 instruction set (called x86-64 , AMD64, or x64), the incorporation of an on-chip memory controller, and the implementation of an extremely high-performance point-to-point interconnect called HyperTransport , as part of the Direct Connect Architecture . The technology was initially launched as the Opteron server-oriented processor on April 22, 2003. Shortly thereafter, it

243-464: A "general matrix multiplication " ( gemm ), of the form where A and B can optionally be transposed or hermitian-conjugated inside the routine, and all three matrices may be strided. The ordinary matrix multiplication A B can be performed by setting α to one and C to an all-zeros matrix of the appropriate size. Also included in Level 3 are routines for computing where T

324-516: A Fortran library in 1979 and its interface was standardized by the BLAS Technical (BLAST) Forum, whose latest BLAS report can be found on the netlib website. This Fortran library is known as the reference implementation (sometimes confusingly referred to as the BLAS library) and is not optimized for speed but is in the public domain . Most libraries that offer linear algebra routines conform to

405-516: A contract with Intel , becoming a licensed second-source manufacturer of 8086 and 8088 processors. IBM wanted to use the Intel 8088 in its IBM PC , but its policy at the time was to require at least two sources for its chips. AMD later produced the Am286 under the same arrangement. In 1984, Intel internally decided to no longer cooperate with AMD in supplying product information to shore up its advantage in

486-491: A fast, cost-effective processor. Finally, in an agreement effective 1996, AMD received the rights to the microcode in Intel's x386 and x486 processor families, but not the rights to the microcode in the following generations of processors. AMD's first in-house x86 processor was the K5 , launched in 1996. The "K" in its name was a reference to Kryptonite , the only substance known to harm comic book character Superman . This itself

567-468: A fork of BLIS that is optimized for the AMD platform. ATLAS is a portable library that automatically optimizes itself for an arbitrary architecture. iMKL is a freeware and proprietary vendor library optimized for x86 and x86-64 with a performance emphasis on Intel processors. OpenBLAS is an open-source library that is hand-optimized for many of the popular architectures. The LINPACK benchmarks rely heavily on

648-473: A group of other technology professionals. The company's early products were primarily memory chips and other components for computers. In 1975, AMD entered the microprocessor market, competing with Intel , its main rival in the industry. In the early 2000s, it experienced significant growth and success, thanks in part to its strong position in the PC market and the success of its Athlon and Opteron processors. However,

729-693: A joint venture with Siemens , a German engineering conglomerate wishing to enhance its technology expertise and enter the American market. Siemens purchased 20% of AMD's stock, giving the company an infusion of cash to increase its product lines. The two companies also jointly established Advanced Micro Computers (AMC), located in Silicon Valley and in Germany, allowing AMD to enter the microcomputer development and manufacturing field, in particular based on AMD's second-source Zilog Z8000 microprocessors. When

810-459: A large, successful flash memory business, even during the dotcom bust . In 2003, to divest some manufacturing and aid its overall cash flow, which was under duress from aggressive microprocessor competition from Intel, AMD spun off its flash memory business and manufacturing into Spansion , a joint venture with Fujitsu , which had been co-manufacturing flash memory with AMD since 1993. In December 2005, AMD divested itself of Spansion to focus on

891-730: A library may include a program to solve a matrix that is upper triangular. The libraries would include single-precision and double-precision versions of some algorithms. Initially, these subroutines used hard-coded loops for their low-level operations. For example, if a subroutine needed to perform a matrix multiplication, then the subroutine would have three nested loops. Linear algebra programs have many common low-level operations (the so-called "kernel" operations, not related to operating systems ). Between 1973 and 1977, several of these kernel operations were identified. These kernel operations became defined subroutines that math libraries could call. The kernel calls had advantages over hard-coded loops:

SECTION 10

#1732855675653

972-413: A new socket G34 for dual and quad-socket processors and thus will be marketed as Opteron 61xx series processors. Lisbon uses socket C32 certified for dual-socket use or single socket use only and thus will be marketed as Opteron 41xx processors. Both will be built on a 45 nm SOI process. Following AMD's 2006 acquisition of Canadian graphics company ATI Technologies , an initiative codenamed Fusion

1053-518: A new native socket AM3 , while maintaining backward compatibility with AM2+ , the socket used for the Phenom, and allowing the use of the DDR2 memory that was used with the platform. In April 2010, AMD released a new Phenom II Hexa-core (6-core) processor codenamed " Thuban ". This was a totally new die based on the hexa-core "Istanbul" Opteron processor. It included AMD's "turbo core" technology, which allows

1134-478: A solver for x in the linear equation with T being triangular. Design of the Level 2 BLAS started in 1984, with results published in 1988. The Level 2 subroutines are especially intended to improve performance of programs using BLAS on vector processors , where Level 1 BLAS are suboptimal "because they hide the matrix-vector nature of the operations from the compiler." This level, formally published in 1990, contains matrix-matrix operations , including

1215-413: A versatile tool and allow e.g. a fast implementation of exponential integrators and Magnus integrators that handle long integration periods with many time steps. Here, the matrix exponentiation , the computationally expensive part of the integration, can be implemented in parallel for all time-steps by using Batched BLAS functions. Advanced Micro Devices Advanced Micro Devices, Inc. ( AMD )

1296-498: A wide range of business and consumer markets, including gaming, data centers, artificial intelligence (AI), and embedded systems. AMD's main products include microprocessors , motherboard chipsets , embedded processors , and graphics processors for servers , workstations , personal computers, and embedded system applications. The company has also expanded into new markets, such as the data center , gaming , and high-performance computing markets. AMD's processors are used in

1377-408: A wide range of computing devices, including personal computers , servers, laptops , and gaming consoles . While it initially manufactured its own processors, the company later outsourced its manufacturing , after GlobalFoundries was spun off in 2009. Through its Xilinx acquisition in 2022, AMD offers field-programmable gate array (FPGA) products. AMD was founded in 1969 by Jerry Sanders and

1458-485: Is a triangular matrix , among other functionality. Due to the ubiquity of matrix multiplications in many scientific applications, including for the implementation of the rest of Level 3 BLAS, and because faster algorithms exist beyond the obvious repetition of matrix-vector multiplication, gemm is a prime target of optimization for BLAS implementers. E.g., by decomposing one or both of A , B into block matrices , gemm can be implemented recursively . This

1539-657: Is a general purpose library that can be used on many different machines without modification. LINPACK could use a generic version of BLAS. To gain performance, different machines might use tailored versions of BLAS. As computer architectures became more sophisticated, vector machines appeared. BLAS for a vector machine could use the machine's fast vector operations. (While vector processors eventually fell out of favor, vector instructions in modern CPUs are essential for optimal performance in BLAS routines.) Other machine features became available and could also be exploited. Consequently, BLAS

1620-526: Is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas . AMD is a hardware and fabless company that designs and develops central processing units (CPUs), graphics processing units (GPUs), field-programmable gate arrays (FPGAs), system-on-chip (SoC), and high-performance compute solutions. AMD serves

1701-443: Is categorized into three sets of routines called "levels", which correspond to both the chronological order of definition and publication, as well as the degree of the polynomial in the complexities of algorithms; Level 1 BLAS operations typically take linear time , O ( n ) , Level 2 operations quadratic time and Level 3 operations cubic time. Modern BLAS implementations typically provide all three levels. This level consists of all

SECTION 20

#1732855675653

1782-420: Is different from Wikidata All article disambiguation pages All disambiguation pages General matrix multiply Basic Linear Algebra Subprograms ( BLAS ) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication , dot products , linear combinations, and matrix multiplication . They are

1863-438: Is one of the motivations for including the β parameter, so the results of previous blocks can be accumulated. Note that this decomposition requires the special case β = 1 which many implementations optimize for, thereby eliminating one multiplication for each value of C . This decomposition allows for better locality of reference both in space and time of the data used in the product. This, in turn, takes advantage of

1944-590: Is superior to ATLAS . A highly tuned implementation based on these ideas is part of the GotoBLAS , OpenBLAS and BLIS . A common variation of gemm is the gemm3m , which calculates a complex product using "three real matrix multiplications and five real matrix additions instead of the conventional four real matrix multiplications and two real matrix additions", an algorithm similar to Strassen algorithm first described by Peter Ungar. Several extensions to BLAS for handling sparse matrices have been suggested over

2025-490: The de facto standard low-level routines for linear algebra libraries; the routines have bindings for both C ("CBLAS interface") and Fortran ("BLAS interface"). Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. BLAS implementations will take advantage of special floating point hardware such as vector registers or SIMD instructions. It originated as

2106-548: The AMD 700 chipset series . The Phenom II came in dual-core, triple-core and quad-core variants, all using the same die, with cores disabled for the triple-core and dual-core versions. The Phenom II resolved issues that the original Phenom had, including a low clock speed, a small L3 cache, and a Cool'n'Quiet bug that decreased performance. The Phenom II cost less but was not performance-competitive with Intel's mid-to-high-range Core 2 Quads. The Phenom II also enhanced its predecessor's memory controller, allowing it to use DDR3 in

2187-641: The Am386 , its clone of the Intel 386 processor. By October of the same year it had sold one million units. In 1993, AMD introduced the first of the Am486 family of processors, which proved popular with a large number of original equipment manufacturers , including Compaq , which signed an exclusive agreement using the Am486. The Am5x86 , another Am486-based processor, was released in November 1995, and continued AMD's success as

2268-546: The DRAM market, and made some headway into the CMOS market, which it had lagged in entering, having focused instead on bipolar chips. AMD had some success in the mid-1980s with the AMD7910 and AMD7911 "World Chip" FSK modem, one of the first multi-standard devices that covered both Bell and CCITT tones at up to 1200 baud half duplex or 300/300 full duplex. Beginning in 1986, AMD embraced

2349-505: The Supreme Court of California sided with the arbitrator and AMD. In 1990, Intel countersued AMD, renegotiating AMD's right to use derivatives of Intel's microcode for its cloned processors. In the face of uncertainty during the legal dispute, AMD was forced to develop clean room designed versions of Intel code for its x386 and x486 processors, the former long after Intel had released its own x386 in 1985. In March 1991, AMD released

2430-496: The cache on the system. For systems with more than one level of cache, the blocking can be applied a second time to the order in which the blocks are used in the computation. Both of these levels of optimization are used in implementations such as ATLAS . More recently, implementations by Kazushige Goto have shown that blocking only for the L2 cache , combined with careful amortizing of copying to contiguous memory to reduce TLB misses,

2511-462: The A4 utilizing the base Radeon HD chip and the rest using a Radeon R4 graphics card, with the exception of the highest-model A10 (A10-7300) which uses an R6 graphics card. Bulldozer was AMD's microarchitecture codename for server and desktop AMD FX processors, first released on October 12, 2011. This family 15h microarchitecture is the successor to the family 10h (K10) microarchitecture design. Bulldozer

GEMM - Misplaced Pages Continue

2592-519: The AMD brand name. In October 2008, AMD announced plans to spin off manufacturing operations in the form of GlobalFoundries Inc. , a multibillion-dollar joint venture with Advanced Technology Investment Co. , an investment company formed by the government of Abu Dhabi . The partnership and spin-off gave AMD an infusion of cash and allowed it to focus solely on chip design. To assure the Abu Dhabi investors of

2673-706: The Am2501 logic counter, which was highly successful. Its bestselling product in 1971 was the Am2505, the fastest multiplier available. In 1971, AMD entered the RAM chip market, beginning with the Am3101, a 64-bit bipolar RAM. That year AMD also greatly increased the sales volume of its linear integrated circuits, and by year-end the company's total annual sales reached US$ 4.6 million. AMD went public in September 1972. The company

2754-500: The BLAS interface, allowing library users to develop programs that are indifferent to the BLAS library being used. Many BLAS libraries have been developed, targeting various different hardware platforms. Examples includes cuBLAS (NVIDIA GPU, GPGPU ), rocBLAS (AMD GPU), and OpenBLAS . Examples of CPU-based BLAS library branches include: OpenBLAS , BLIS (BLAS-like Library Instantiation Software) , Arm Performance Libraries, ATLAS , and Intel Math Kernel Library (iMKL). AMD maintains

2835-568: The BLAS routine gemm for its performance measurements. Many numerical software applications use BLAS-compatible libraries to do linear algebra computations, including LAPACK , LINPACK , Armadillo , GNU Octave , Mathematica , MATLAB , NumPy , R , Julia and Lisp-Stat. With the advent of numerical programming, sophisticated subroutine libraries became useful. These libraries would contain subroutines for common high-level mathematical operations such as root finding, matrix inversion, and solving systems of equations. The language of choice

2916-767: The GEMM routine, those architectures show significant performance losses. To address this issue, in 2017 a batched version of the BLAS function has been specified. Taking the GEMM routine from above as an example, the batched version performs the following computation simultaneously for many matrices: C [ k ] ← α A [ k ] B [ k ] + β C [ k ] ∀ k {\displaystyle {\boldsymbol {C}}[k]\leftarrow \alpha {\boldsymbol {A}}[k]{\boldsymbol {B}}[k]+\beta {\boldsymbol {C}}[k]\quad \forall k} The index k {\displaystyle k} in square brackets indicates that

2997-545: The Llano. More AMD APUs for laptops running Windows 7 and Windows 8 OS are being used commonly. These include AMD's price-point APUs, the E1 and E2, and their mainstream competitors with Intel's Core i -series: The Vision A- series, the A standing for accelerated. These range from the lower-performance A4 chipset to the A6, A8, and A10. These all incorporate next-generation Radeon graphics cards, with

3078-505: The Spider at 65nm , which was uncompetitive with Intel's smaller and more power-efficient 45nm . In January 2009, AMD released a new processor line dubbed Phenom II , a refresh of the original Phenom built using the 45 nm process. AMD's new platform, codenamed " Dragon ", used the new Phenom II processor, and an ATI R770 GPU from the R700 GPU family, as well as a 790 GX/FX chipset from

3159-516: The United States. AMD rode out the mid-1980s crisis by aggressively innovating and modernizing, devising the Liberty Chip program of designing and manufacturing one new chip or chipset per week for 52 weeks in fiscal year 1986, and by heavily lobbying the U.S. government until sanctions and restrictions were put in place to prevent predatory Japanese pricing. During this time, AMD withdrew from

3240-546: The brand name Athlon on June 23, 1999. Unlike previous AMD processors, it could not be used on the same motherboards as Intel's, due to licensing issues surrounding Intel's Slot 1 connector, and instead used a Slot A connector, referenced to the Alpha processor bus. The Duron was a lower-cost and limited version of the Athlon (64 KB instead of 256 KB L2 cache) in a 462-pin socketed PGA (socket A) or soldered directly onto

3321-535: The company already had overseas assembly facilities in Penang and Manila , and began construction on a fabrication plant in San Antonio in 1981. In 1980, AMD began supplying semiconductor products for telecommunications, an industry undergoing rapid expansion and innovation. Intel had introduced the first x86 microprocessors in 1978. In 1981, IBM created its PC , and wanted Intel's x86 processors, but only under

GEMM - Misplaced Pages Continue

3402-466: The company faced challenges in the late 2000s and early 2010s, as it struggled to keep up with Intel in the race to produce faster and more powerful processors. In the late 2010s, AMD regained market share by pursuing a penetration pricing strategy and building on the success of its Ryzen processors, which were considerably more competitive with Intel microprocessors in terms of performance while offering attractive pricing. Advanced Micro Devices

3483-506: The condition that Intel would also provide a second-source manufacturer for its patented x86 microprocessors. Intel and AMD entered into a 10-year technology exchange agreement, first signed in October 1981 and formally executed in February 1982. The terms of the agreement were that each company could acquire the right to become a second-source manufacturer of semiconductor products developed by

3564-423: The course of the library's history; a small set of sparse matrix kernel routines was finally standardized in 2002. The traditional BLAS functions have been also ported to architectures that support large amounts of parallelism such as GPUs . Here, the traditional BLAS functions provide typically good performance for large matrices. However, when computing e.g., matrix-matrix-products of many small matrices by using

3645-467: The cross-licensing agreement would be effectively canceled. Beginning in 1982, AMD began volume-producing second-source Intel-licensed 8086, 8088, 80186, and 80188 processors, and by 1984, its own Am286 clone of Intel's 80286 processor, for the rapidly growing market of IBM PCs and IBM clones . It also continued its successful concentration on proprietary bipolar chips. The company continued to spend greatly on research and development, and created

3726-472: The early computer industry since unreliability in microchips was a distinct problem that customers – including computer manufacturers , the telecommunications industry , and instrument manufacturers – wanted to avoid. In November 1969, the company manufactured its first product: the Am9300, a 4-bit MSI shift register , which began selling in 1970. Also in 1970, AMD produced its first proprietary product,

3807-627: The end of 2014. After the GlobalFoundries spin-off and subsequent layoffs, AMD was left with significant vacant space at 1 AMD Place, its aging Sunnyvale headquarters office complex. In August 2016, AMD's 47 years in Sunnyvale came to a close when it signed a lease with the Irvine Company for a new 220,000 sq. ft. headquarters building in Santa Clara. AMD's new location at Santa Clara Square faces

3888-466: The face of declining sales revenue. The inclusion of AMD chips into the PlayStation 4 and Xbox One were later seen as saving AMD from bankruptcy. AMD acquired the low-power server manufacturer SeaMicro in early 2012, with an eye to bringing out an Arm64 server chip. On October 8, 2014, AMD announced that Rory Read had stepped down after three years as president and chief executive officer. He

3969-448: The first server Opteron K10 processors, followed in November by the Phenom processor for desktop. K10 processors came in dual-core, triple-core , and quad-core versions, with all cores on a single die. AMD released a new platform codenamed " Spider ", which used the new Phenom processor, as well as an R770 GPU and a 790 GX/FX chipset from the AMD 700 chipset series . However, AMD built

4050-642: The footsteps of Robert Noyce (developer of the first silicon integrated circuit at Fairchild in 1959) and Gordon Moore , who together founded the semiconductor company Intel in July 1968. In September 1969, AMD moved from its temporary location in Santa Clara to Sunnyvale, California . To immediately secure a customer base, AMD initially became a second source supplier of microchips designed by Fairchild and National Semiconductor . AMD first focused on producing logic chips. The company guaranteed quality control to United States Military Standard , an advantage in

4131-588: The headquarters of archrival Intel across the Bayshore Freeway and San Tomas Aquino Creek . Around the same time, AMD also agreed to sell 1 AMD Place to the Irvine Company. In April 2019, the Irvine Company secured approval from the Sunnyvale City Council of its plans to demolish 1 AMD Place and redevelop the entire 32-acre site into townhomes and apartments. In October 2020, AMD announced that it

SECTION 50

#1732855675653

4212-455: The library routine would be more readable, there were fewer chances for bugs, and the kernel implementation could be optimized for speed. A specification for these kernel operations using scalars and vectors , the level-1 Basic Linear Algebra Subroutines (BLAS), was published in 1979. BLAS was used to implement the linear algebra subroutine library LINPACK . The BLAS abstraction allows customization for high performance. For example, LINPACK

4293-455: The marketplace, and delayed and eventually refused to convey the technical details of the Intel 80386 . In 1987, AMD invoked arbitration over the issue, and Intel reacted by canceling the 1982 technological-exchange agreement altogether. After three years of testimony, AMD eventually won in arbitration in 1992, but Intel disputed this decision. Another long legal dispute followed, ending in 1994 when

4374-499: The microarchitecture, and a shift of the target market from mainstream desktop systems to value dual-core desktop systems. In 2008, AMD started to release dual-core Sempron processors exclusively in China, branded as the Sempron 2000 series, with lower HyperTransport speed and smaller L2 cache. AMD completed its dual-core product portfolio for each market segment. In September 2007, AMD released

4455-527: The microprocessor market with the Am9080 , a reverse-engineered clone of the Intel 8080 , and the Am2900 bit-slice microprocessor family. When Intel began installing microcode in its microprocessors in 1976, it entered into a cross-licensing agreement with AMD, which was granted a copyright license to the microcode in its microprocessors and peripherals, effective October 1976. In 1977, AMD entered into

4536-550: The microprocessor market, and Spansion went public in an IPO. On July 24, 2006, AMD announced its acquisition of the Canadian 3D graphics card company ATI Technologies . AMD paid $ 4.3 billion and 58 million shares of its capital stock , for approximately $ 5.4 billion. The transaction was completed on October 25, 2006. On August 30, 2010, AMD announced that it would retire the ATI brand name for its graphics chipsets in favor of

4617-560: The motherboard. Sempron was released as a lower-cost Athlon XP, replacing Duron in the socket A PGA era. It has since been migrated upward to all new sockets, up to AM3 . On October 9, 2001, the Athlon XP was released. On February 10, 2003, the Athlon XP with 512 KB L2 Cache was released. The K8 was a major revision of the K7 architecture, with the most notable features being the addition of

4698-631: The new Bulldozer products were slower than the K10 models they were built to replace. The Piledriver microarchitecture was the 2012 successor to Bulldozer, increasing clock speeds and performance relative to its predecessor. Piledriver would be released in AMD FX, APU, and Opteron product lines. Piledriver was subsequently followed by the Steamroller microarchitecture in 2013. Used exclusively in AMD's APUs, Steamroller focused on greater parallelism. In 2015,

4779-707: The new venture's success, AMD's CEO Hector Ruiz stepped down in July 2008, while remaining executive chairman, in preparation for becoming chairman of GlobalFoundries in March 2009. President and COO Dirk Meyer became AMD's CEO. Recessionary losses necessitated AMD cutting 1,100 jobs in 2009. In August 2011, AMD announced that former Lenovo executive Rory Read would be joining the company as CEO, replacing Meyer. In November 2011, AMD announced plans to lay off more than 10% (1,400) of its employees from across all divisions worldwide. In October 2012, it announced plans to lay off an additional 15% of its workforce to reduce costs in

4860-403: The operation is performed for all matrices k {\displaystyle k} in a stack. Often, this operation is implemented for a strided batched memory layout where all matrices follow concatenated in the arrays A {\displaystyle A} , B {\displaystyle B} and C {\displaystyle C} . Batched BLAS functions can be

4941-461: The other; that is, each party could "earn" the right to manufacture and sell a product developed by the other, if agreed to, by exchanging the manufacturing rights to a product of equivalent technical complexity. The technical information and licenses needed to make and sell a part would be exchanged for a royalty to the developing company. The 1982 agreement also extended the 1976 AMD–Intel cross-licensing agreement through 1995. The agreement included

SECTION 60

#1732855675653

5022-436: The perceived shift toward RISC with their own AMD Am29000 (29k) processor; the 29k survived as an embedded processor . The company also increased its EPROM memory market share in the late 1980s. Throughout the 1980s, AMD was a second-source supplier of Intel x86 processors. In 1991, it introduced its 386-compatible Am386 , an AMD-designed chip. Creating its own chips, AMD began to compete directly with Intel. AMD had

5103-521: The processor to automatically switch from 6 cores to 3 faster cores when more pure speed is needed. The Magny Cours and Lisbon server parts were released in 2010. The Magny Cours part came in 8 to 12 cores and the Lisbon part in 4 and 6 core parts. Magny Cours is focused on performance while the Lisbon part is focused on high performance per watt. Magny Cours is an MCM ( multi-chip module ) with two hexa-core "Istanbul" Opteron parts. This will use

5184-425: The right to invoke arbitration of disagreements, and after five years the right of either party to end the agreement with one year's notice. The main result of the 1982 agreement was that AMD became a second-source manufacturer of Intel's x86 microprocessors and related chips, and Intel provided AMD with database tapes for its 8086 , 80186 , and 80286 chips. However, in the event of a bankruptcy or takeover of AMD,

5265-585: The rights to their Nx series of x86-compatible processors. AMD gave the NexGen design team their own building, left them alone, and gave them time and money to rework the Nx686. The result was the K6 processor, introduced in 1997. Although it was based on Socket 7 , variants such as K6-III /450 were faster than Intel's Pentium II (sixth-generation processor). The K7 was AMD's seventh-generation x86 processor, making its debut under

5346-422: The routines described in the original presentation of BLAS (1979), which defined only vector operations on strided arrays : dot products , vector norms , a generalized vector addition of the form (called " axpy ", "a x plus y") and several other operations. This level contains matrix-vector operations including, among other things, a ge neralized m atrix- v ector multiplication ( gemv ): as well as

5427-404: The same term [REDACTED] This disambiguation page lists articles associated with the title GEMM . If an internal link led you here, you may wish to change the link to point directly to the intended article. Retrieved from " https://en.wikipedia.org/w/index.php?title=GEMM&oldid=930324284 " Category : Disambiguation pages Hidden categories: Short description

5508-548: The two companies' vision for Advanced Micro Computers diverged, AMD bought out Siemens' stake in the American division in 1979. AMD closed Advanced Micro Computers in late 1981 after switching focus to manufacturing second-source Intel x86 microprocessors. Total sales in fiscal year 1978 topped $ 100 million, and in 1979, AMD debuted on the New York Stock Exchange . In 1979, production also began on AMD's new semiconductor fabrication plant in Austin, Texas ;

5589-499: The world's first 512K EPROM in 1984. That year, AMD was listed in the book The 100 Best Companies to Work for in America , and later made the Fortune 500 list for the first time in 1985. By mid-1985, the microchip market experienced a severe downturn, mainly due to long-term aggressive trade practices ( dumping ) from Japan, but also due to a crowded and non-innovative chip market in

5670-492: Was FORTRAN . The most prominent numerical programming library was IBM 's Scientific Subroutine Package (SSP). These subroutine libraries allowed programmers to concentrate on their specific problems and avoid re-implementing well-known algorithms. The library routines would also be better than average implementations; matrix algorithms, for example, might use full pivoting to get better numerical accuracy. The library routines would also have more efficient routines. For example,

5751-436: Was a clean-sheet design, not a development of earlier processors. The core was specifically aimed at 10–125 W TDP computing products. AMD claimed dramatic performance-per-watt efficiency improvements in high-performance computing (HPC) applications with Bulldozer cores. While hopes were high that Bulldozer would bring AMD to be performance-competitive with Intel once more, most benchmarks were disappointing. In some cases

5832-425: Was a reference to Intel's hegemony over the market, i.e., an anthropomorphization of them as Superman. The number "5" was a reference to the fifth generation of x86 processors; rival Intel had previously introduced its line of fifth-generation x86 processors as Pentium because the U.S. Trademark and Patent Office had ruled that mere numbers could not be trademarked. In 1996, AMD purchased NexGen , specifically for

5913-507: Was a second source for Intel MOS / LSI circuits by 1973, with products such as Am14/1506 and Am14/1507, dual 100-bit dynamic shift registers. By 1975, AMD was producing 212 products – of which 49 were proprietary, including the Am9102 (a static N-channel 1024-bit RAM) and three low-power Schottky MSI circuits: Am25LS07, Am25LS08, and Am25LS09. Intel had created the first microprocessor , its 4-bit 4004 , in 1971. By 1975, AMD entered

5994-433: Was acquiring Xilinx , one of the market leaders in field programmable gate arrays and complex programmable logic devices (FPGAs and CPLDs) in an all-stock transaction. The acquisition was completed in February 2022, with an estimated acquisition price of $ 50 billion. In October 2023, AMD acquired an open-source AI software provider, Nod.ai, to bolster its AI software ecosystem. In January 2024, AMD announced it

6075-572: Was announced to integrate a CPU and GPU together on some of AMD's microprocessors, including a built in PCI Express link to accommodate separate PCI Express peripherals, eliminating the northbridge chip from the motherboard. The initiative intended to move some of the processing originally done on the CPU (e.g. floating-point unit operations) to the GPU, which is better optimized for some calculations. The Fusion

6156-666: Was augmented from 1984 to 1986 with level-2 kernel operations that concerned vector-matrix operations. Memory hierarchy was also recognized as something to exploit. Many computers have cache memory that is much faster than main memory; keeping matrix manipulations localized allows better usage of the cache. In 1987 and 1988, the level 3 BLAS were identified to do matrix-matrix operations. The level 3 BLAS encouraged block-partitioned algorithms. The LAPACK library uses level 3 BLAS. The original BLAS concerned only densely stored vectors and matrices. Further extensions to BLAS, such as for sparse matrices, have been addressed. BLAS functionality

6237-523: Was discontinuing the production of all complex programmable logic devices (CPLDs) acquired through Xilinx. In March 2024, a rally in semiconductor stocks pushed AMD's valuation above $ 300B for the first time. In July 2024 AMD announced that it would acquire the Finnish-based artificial intelligence startup company Silo AI in a $ 665 million all-cash deal in an attempt to better compete with AI chip market leader Nvidia . In February 1982, AMD signed

6318-431: Was formally incorporated by Jerry Sanders , along with seven of his colleagues from Fairchild Semiconductor , on May 1, 1969. Sanders, an electrical engineer who was the director of marketing at Fairchild, had, like many Fairchild executives, grown frustrated with the increasing lack of support, opportunity, and flexibility within the company. He later decided to leave to start his own semiconductor company, following

6399-530: Was incorporated into a product for desktop PCs, branded Athlon 64 . On April 21, 2005, AMD released the first dual-core Opteron , an x86-based server CPU. A month later, it released the Athlon 64 X2 , the first desktop-based dual-core processor family. In May 2007, AMD abandoned the string "64" in its dual-core desktop product branding, becoming Athlon X2, downplaying the significance of 64-bit computing in its processors. Further updates involved improvements to

6480-565: Was later renamed the AMD APU (Accelerated Processing Unit). Llano was AMD's first APU built for laptops. Llano was the second APU released, targeted at the mainstream market. It incorporated a CPU and GPU on the same die, as well as northbridge functions, and used " Socket FM1 " with DDR3 memory. The CPU part of the processor was based on the Phenom II "Deneb" processor. AMD suffered an unexpected decrease in revenue based on production problems for

6561-706: Was succeeded by Lisa Su , a key lieutenant who had been chief operating officer since June. On October 16, 2014, AMD announced a new restructuring plan along with its Q3 results. Effective July 1, 2014, AMD reorganized into two business groups: Computing and Graphics, which primarily includes desktop and notebook processors and chipsets, discrete GPUs, and professional graphics; and Enterprise, Embedded, and Semi-Custom, which primarily includes server and embedded processors, dense servers, semi-custom SoC products (including solutions for gaming consoles ), engineering services, and royalties. As part of this restructuring, AMD announced that 7% of its global workforce would be laid off by

#652347