Misplaced Pages

C++20

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

C++20 is a version of the ISO / IEC 14882 standard for the C++ programming language. C++20 replaced the prior version of the C++ standard, called C++17 , and was later replaced by C++23 . The standard was technically finalized by WG21 at the meeting in Prague in February 2020, had its final draft version announced in March 2020, was approved on 4 September 2020, and published in December 2020.

#237762

92-443: C++20 adds more new major features than C++14 or C++17 . Changes that have been accepted into C++20 include: Many new keywords added (and the new "spaceship operator", operator <=> ), such as concept , constinit , consteval , co_await , co_return , co_yield , requires (plus changed meaning for export ), and char8_t (for UTF-8 support). And explicit can take an expression since C++20. Most of

184-502: A Private Use Area . In either approach, the byte value is encoded in the low eight bits of the output code point. These encodings are needed if invalid UTF-8 is to survive translation to and then back from the UTF-16 used internally by Python, and as Unix filenames can contain invalid UTF-8 it is necessary for this to work. The official name for the encoding is UTF-8 , the spelling used in all Unicode Consortium documents. The hyphen-minus

276-575: A neuromorphic CMOS integrated circuit and announced a $ 3 billion investment over the following five years to design a neural chip that mimics the human brain, with 10 billion neurons and 100 trillion synapses, but that uses just 1 kilowatt of power. In 2016, the company launched all-flash arrays designed for small and midsized companies, which includes software for data compression, provisioning, and snapshots across various systems. In January 2019, IBM introduced its first commercial quantum computer: IBM Q System One . In March 2020, it

368-400: A BOM (a change from Windows 7 Notepad ), bringing it into line with most other text editors. Some system files on Windows 11 require UTF-8 with no requirement for a BOM, and almost all files on macOS and Linux are required to be UTF-8 without a BOM. Programming languages that default to UTF-8 for I/O include Ruby  3.0, R  4.2.2, Raku and Java  18. Although

460-621: A Hollerith department called Hollerith Abteilung, which had IBM machines, including calculating and sorting machines. IBM as a military contractor produced 6% of the M1 Carbine rifles used in World War II, about 346,500 of them, between August 1943 and May. IBM built the Automatic Sequence Controlled Calculator , an electromechanical computer, during World War II. It offered its first commercial stored-program computer,

552-445: A byte stream encoding of its 32-bit code points. This encoding was not satisfactory on performance grounds, among other problems, and the biggest problem was probably that it did not have a clear separation between ASCII and non-ASCII: new UTF-1 tools would be backward compatible with ASCII-encoded text, but UTF-1-encoded text could confuse existing code expecting ASCII (or extended ASCII ), because it could contain continuation bytes in

644-472: A byte with the high bit set cannot be alone; and in a truly random string a byte with a high bit set has only a 1 ⁄ 15 chance of starting a valid UTF-8 character. This has the (possibly unintended) consequence of making it easy to detect if a legacy text encoding is accidentally used instead of UTF-8, making conversion of a system to UTF-8 easier and avoiding the need to require a Byte Order Mark or any other metadata. Since RFC 3629 (November 2003),

736-578: A consequence, IBM quickly began losing its market dominance to emerging competitors in the PC market. In 1985, IBM collaborated with Microsoft to develop a new operating system , which was released as OS/2 . Following a dispute, Microsoft severed the collaboration and IBM continued development of OS/2 on its own but it failed in the marketplace against Microsoft's Windows during the mid-1990s. In 1991 IBM began spinning off its many divisions into autonomous subsidiaries (so-called "Baby Blues") in an attempt to make

828-657: A fifth company, the Computing-Tabulating-Recording Company (CTR) based in Endicott, New York. The five companies had 1,300 employees and offices and plants in Endicott and Binghamton , New York; Dayton, Ohio ; Detroit, Michigan ; Washington, D.C. ; and Toronto , Canada. Collectively, the companies manufactured a wide array of machinery for sale and lease, ranging from commercial scales and industrial time recorders, meat and cheese slicers, to tabulators and punched cards. Thomas J. Watson, Sr. , fired from

920-447: A focus on customer service, an insistence on well-groomed, dark-suited salesmen and had an evangelical fervor for instilling company pride and loyalty in every worker". His favorite slogan, " THINK ", became a mantra for each company's employees. During Watson's first four years, revenues reached $ 9 million ($ 158 million today) and the company's operations expanded to Europe, South America, Asia and Australia. Watson never liked

1012-629: A future version of Python is planned to store strings as UTF-8 by default. Modern versions of Microsoft Visual Studio use UTF-8 internally. Microsoft's SQL Server 2019 added support for UTF-8, and using it results in a 35% speed increase, and "nearly 50% reduction in storage requirements." Java internally uses Modified UTF-8 (MUTF-8), in which the null character U+0000 uses the two-byte overlong encoding 0xC0 ,  0x80 , instead of just 0x00 . Modified UTF-8 strings never contain any actual null bytes but can contain all Unicode code points including U+0000, which allows such strings (with

SECTION 10

#1732856069238

1104-468: A null byte appended) to be processed by traditional null-terminated string functions. Java reads and writes normal UTF-8 to files and streams, but it uses Modified UTF-8 for object serialization , for the Java Native Interface , and for embedding constant strings in class files . The dex format defined by Dalvik also uses the same modified UTF-8 to represent string values. Tcl also uses

1196-501: A reader start anywhere and immediately detect character boundaries, at the cost of being somewhat less bit-efficient than the previous proposal. It also abandoned the use of biases that prevented overlong encodings . Thompson's design was outlined on September 2, 1992, on a placemat in a New Jersey diner with Rob Pike . In the following days, Pike and Thompson implemented it and updated Plan 9 to use it throughout, and then communicated their success back to X/Open, which accepted it as

1288-606: A row in the above table to encode a code point less than "First code point" (thus using more bytes than necessary) is termed an overlong encoding . These are a security problem because they allow the same code point to be encoded in multiple ways. Overlong encodings (of ../ for example) have been used to bypass security validations in high-profile products including Microsoft's IIS web server and Apache's Tomcat servlet container. Overlong encodings should therefore be considered an error and never decoded. Modified UTF-8 allows an overlong encoding of U+0000 . The chart below gives

1380-616: A separate lawsuit. In 2015, IBM bought the digital part of The Weather Company , Truven Health Analytics for $ 2.6 billion in 2016, and in October 2018, IBM announced its intention to acquire Red Hat for $ 34 billion, which was completed on July 9, 2019. In February 2020, IBM's John Kelly III joined Brad Smith of Microsoft to sign a pledge with the Vatican to ensure the ethical use and practice of Artificial Intelligence (AI) . IBM announced in October 2020 that it would divest

1472-475: A sharp drop in profit margins during the second quarter of fiscal year 1992; market analysts attributed the drop to a fierce price war in the personal computer market over the summer of 1992. The corporate restructuring was one of the largest and most expensive in history up to that point. By the summer of 1993, the IBM PC Co. had divided into multiple business units itself, including Ambra Computer Corporation and

1564-600: Is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit . Almost every webpage is stored in UTF-8. UTF-8 is capable of encoding all 1,112,064 valid Unicode scalar values using a variable-width encoding of one to four one- byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It

1656-497: Is assumed to be UTF-8 to be losslessly transformed to UTF-16 or UTF-32, by translating the 128 possible error bytes to reserved code points, and transforming those code points back to error bytes to output UTF-8. The most common approach is to translate the codes to U+DC80...U+DCFF which are low (trailing) surrogate values and thus "invalid" UTF-16, as used by Python 's PEP 383 (or "surrogateescape") approach. Another encoding called MirBSD OPTU-8/16 converts them to U+EF80...U+EFFF in

1748-514: Is called CESU-8 . If the Unicode byte-order mark U+FEFF is at the start of a UTF-8 file, the first three bytes will be 0xEF , 0xBB , 0xBF . The Unicode Standard neither requires nor recommends the use of the BOM for UTF-8, but warns that it may be encountered at the start of a file trans-coded from another encoding. While ASCII text encoded using UTF-8 is backward compatible with ASCII, this

1840-409: Is dominant for all countries/languages on the internet, is used in most standards, often the only allowed encoding, and is supported by all modern operating systems and programming languages. The International Organization for Standardization (ISO) set out to compose a universal multi-byte character set in 1989. The draft ISO 10646 standard contained a non-required annex called UTF-1 that provided

1932-589: Is done; for this you can use utf8-c8 ". That UTF-8 Clean-8 variant, implemented by Raku, is an encoder/decoder that preserves bytes as is (even illegal UTF-8 sequences) and allows for Normal Form Grapheme synthetics. Version 3 of the Python programming language treats each byte of an invalid UTF-8 bytestream as an error (see also changes with new UTF-8 mode in Python 3.7 ); this gives 128 different possible errors. Extensions have been created to allow any byte sequence that

SECTION 20

#1732856069238

2024-456: Is either one continuation byte, or ends at the first byte that is disallowed, so E1,A0,20 is a two-byte error followed by a space. This means an error is no more than three bytes long and never contains the start of a valid character, and there are 21,952  different possible errors. Technically this makes UTF-8 no longer a prefix code (you have to read one byte past some errors to figure out they are an error), but searching still works if

2116-417: Is not true when Unicode Standard recommendations are ignored and a BOM is added. A BOM can confuse software that isn't prepared for it but can otherwise accept UTF-8, e.g. programming languages that permit non-ASCII bytes in string literals but not at the start of the file. Nevertheless, there was and still is software that always inserts a BOM when writing UTF-8, and refuses to correctly interpret UTF-8 unless

2208-466: Is required and no spaces are allowed. Some other names used are: There are several current definitions of UTF-8 in various standards documents: They supersede the definitions given in the following obsolete works: They are all the same in their general mechanics, with the main differences being on issues such as allowed range of code point values and safe handling of invalid input. IBM International Business Machines Corporation (using

2300-440: Is the largest shareholder of IBM and as of March 31, 2023, held 15.7% of total shares outstanding. In 2011, IBM became the first technology company Warren Buffett 's holding company Berkshire Hathaway invested in. Initially he bought 64 million shares costing $ 10.5 billion. Over the years, Buffett increased his IBM holdings, but by the end of 2017 had reduced them by 94.5% to 2.05 million shares; by May 2018, he

2392-560: The FORTRAN scientific programming language was developed. In 1961, IBM developed the SABRE reservation system for American Airlines and introduced the highly successful Selectric typewriter. In 1963, IBM employees and computers helped NASA track the orbital flights of the Mercury astronauts. A year later, it moved its corporate headquarters from New York City to Armonk, New York. The latter half of

2484-800: The IBM Rome Software Lab (Rome, Italy), Hursley House (Winchester, UK), 330 North Wabash (Chicago, Illinois, United States), the Cambridge Scientific Center (Cambridge, Massachusetts, United States), the IBM Toronto Software Lab (Toronto, Canada), the IBM Building, Johannesburg (Johannesburg, South Africa), the IBM Building (Seattle) (Seattle, Washington, United States), the IBM Hakozaki Facility (Tokyo, Japan),

2576-840: The IBM Yamato Facility (Yamato, Japan), the IBM Canada Head Office Building (Ontario, Canada) and the Watson IoT Headquarters (Munich, Germany). Defunct IBM campuses include the IBM Somers Office Complex (Somers, New York), Spango Valley (Greenock, Scotland), and Tour Descartes (Paris, France). The company's contributions to industrial architecture and design include works by Marcel Breuer , Eero Saarinen , Ludwig Mies van der Rohe , I.M. Pei and Ricardo Legorreta . Van der Rohe's building in Chicago

2668-532: The National Cash Register Company by John Henry Patterson , called on Flint and, in 1914, was offered a position at CTR. Watson joined CTR as general manager and then, 11 months later, was made President when antitrust cases relating to his time at NCR were resolved. Having learned Patterson's pioneering business practices, Watson proceeded to put the stamp of NCR onto CTR's companies. He implemented sales conventions, "generous sales incentives,

2760-653: The Universal Product Code . IBM and the World Bank first introduced financial swaps to the public in 1981, when they entered into a swap agreement. IBM entered the microcomputer market in the 1980s with the IBM Personal Computer (IBM 5150), which soon became known as the PC , one of IBM's best selling products. Due to a lack of foresight by IBM, the PC was not well protected by intellectual property laws. As

2852-410: The magnetic stripe card that would become ubiquitous for credit/debit/ATM cards, driver's licenses, rapid transit cards and a multitude of other identity and access control applications. IBM pioneered the manufacture of these cards, and for most of the 1970s, the data processing systems and software for such applications ran exclusively on IBM computers. In 1974, IBM engineer George J. Laurer developed

C++20 - Misplaced Pages Continue

2944-399: The replacement character "�" (U+FFFD) and continue decoding. Some decoders consider the sequence E1,A0,20 (a truncated 3-byte code followed by a space) as a single error. This is not a good idea as a search for a space character would find the one hidden in the error. Since Unicode 6 (October 2010) the standard (chapter 3) has recommended a "best practice" where the error

3036-525: The trademark IBM ), nicknamed Big Blue , is an American multinational technology company headquartered in Armonk, New York and present in over 175 countries. It is a publicly traded company and one of the 30 companies in the Dow Jones Industrial Average . IBM is the largest industrial research organization in the world, with 19 research facilities across a dozen countries, having held

3128-691: The "Visual C++ for Linux Development" extension. Partial Changes applied to the C++20 working draft in July 2017 (Toronto) include: Changes applied to the C++20 working draft in the fall meeting in November 2017 (Albuquerque) include: Changes applied to the C++20 working draft in March 2018 (Jacksonville) include: Changes applied to the C++20 working draft in the summer meeting in June 2018 (Rapperswil) include: Changes applied to

3220-522: The 1960s and 1970s, the IBM mainframe , exemplified by the System/360 , was the world's dominant computing platform , with the company producing 80 percent of computers in the U.S. and 70 percent of computers worldwide. IBM debuted in the microcomputer market in 1981 with the IBM Personal Computer , — its DOS software provided by Microsoft , — which became the basis for the majority of personal computers to

3312-566: The 1960s saw IBM continue its support of space exploration, participating in the 1965 Gemini flights, 1966 Saturn flights, and 1969 lunar mission. IBM also developed and manufactured the Saturn V's Instrument Unit and Apollo spacecraft guidance computers. On April 7, 1964, IBM launched the first computer system family, the IBM System/360 . It spanned the complete range of commercial and scientific applications from large to small, allowing companies for

3404-859: The 21st century. As one of the world's oldest and largest technology companies, IBM has been responsible for several technological innovations , including the automated teller machine (ATM), dynamic random-access memory (DRAM), the floppy disk , the hard disk drive , the magnetic stripe card , the relational database , the SQL programming language , and the UPC barcode . The company has made inroads in advanced computer chips , quantum computing , artificial intelligence , and data infrastructure . IBM employees and alumni have won various recognitions for their scientific research and inventions, including six Nobel Prizes and six Turing Awards . IBM originated with several technological innovations developed and commercialized in

3496-508: The C++20 working draft in the fall meeting in November 2018 (San Diego) include: Changes applied to the C++20 working draft in the winter meeting in February 2019 (Kona) include: Changes applied to the C++20 working draft in the summer meeting in July 2019 (Cologne) include: Changes applied during the NB comment resolution in the fall meeting in November 2019 (Belfast) include: UTF-8 UTF-8

3588-484: The IBM PC Co. was dissolved and merged into IBM Personal Systems Group. In 2002 IBM acquired PwC Consulting, the consulting arm of PwC which was merged into its IBM Global Services . On September 14, 2004, LG and IBM announced that their business alliance in the South Korean market would end at the end of that year. Both companies stated that it was unrelated to the charges of bribery earlier that year. Xnote

3680-543: The IBM Power Personal Systems Group, the former an attempt to design and market " clone " computers of IBM's own architecture and the latter responsible for IBM's PowerPC -based workstations . IBM PC Co. introduced the ThinkPad clone computers, which IBM would heavily market and would eventually become one of the best-selling series of notebook computers . In 1993, IBM posted an $ 8 billion loss – at

3772-782: The IBM website. On June 7, Krishna announced that IBM would carry out an "orderly wind-down" of its operations in Russia. In late 2022, IBM started a collaboration with new Japanese manufacturer Rapidus , which led GlobalFoundries to file a lawsuit against IBM the following year. In 2023, IBM acquired Manta Software Inc. to complement its data and A.I. governance capabilities for an undisclosed amount. On November 16, 2023, IBM suspended ads on Twitter after ads were found next to pro-Nazi content. In December 2023, IBM announced it would acquire Software AG 's StreamSets and webMethods platforms for €2.13 billion ($ 2.33 billion). IBM's market capitalization

C++20 - Misplaced Pages Continue

3864-533: The Managed Infrastructure Services unit of its Global Technology Services division into a new public company. The new company, Kyndryl , will have 90,000 employees, 4,600 clients in 115 countries, with a backlog of $ 60 billion. IBM's spin off was greater than any of its previous divestitures, and welcomed by investors. IBM appointed Martin Schroeter, who had been IBM's CFO from 2014 through

3956-509: The Weather Channel mobile app. Also that year, IBM employees created the film A Boy and His Atom , which was the first molecule movie to tell a story. In 2016, IBM acquired video conferencing service Ustream and formed a new cloud video unit. In April 2016, it posted a 14-year low in quarterly sales. The following month, Groupon sued IBM accusing it of patent infringement, two months after IBM accused Groupon of patent infringement in

4048-463: The antitrust laws in IBM's actions directed against leasing companies and plug-compatible peripheral manufacturers. Shortly after, IBM unbundled its software and services in what many observers believed was a direct result of the lawsuit, creating a competitive market for software. In 1982, the Department of Justice dropped the case as "without merit". Also in 1969, IBM engineer Forrest Parry invented

4140-631: The capability for an application to set UTF-8 as the "code page" for the Windows API, removing the need to use UTF-16; and more recently has recommended programmers use UTF-8, and even states "UTF-16 [...] is a unique burden that Windows places on code that targets multiple platforms". The default string primitive in Go , Julia , Rust , Swift (since version 5), and PyPy uses UTF-8 internally in all cases. Python (since version 3.3) uses UTF-8 internally for Python C API extensions and sometimes for strings and

4232-591: The city's seventh tallest building and overlooking Beijing National Stadium ("Bird's Nest") , home to the 2008 Summer Olympics . IBM India Private Limited is the Indian subsidiary of IBM, which is headquartered at Bangalore , Karnataka. It has facilities in Coimbatore , Chennai , Kochi , Ahmedabad , Delhi , Kolkata , Mumbai , Pune , Gurugram , Noida , Bhubaneshwar , Surat , Visakhapatnam , Hyderabad , Bangalore and Jamshedpur . Other notable buildings include

4324-560: The clumsy hyphenated name "Computing-Tabulating-Recording Company" and chose to replace it with the more expansive title "International Business Machines" which had previously been used as the name of CTR's Canadian Division; the name was changed on February 14, 1924. By 1933, most of the subsidiaries had been merged into one company, IBM. The Nazis made extensive use of Hollerith punch card and alphabetical accounting equipment and IBM's majority-owned German subsidiary, Deutsche Hollerith Maschinen GmbH ( Dehomag ), supplied this equipment from

4416-461: The company designed a video surveillance system for Davao City . In 2014 IBM announced it would sell its x86 server division to Lenovo for $ 2.1 billion. while continuing to offer Power ISA -based servers. Also that year, IBM began announcing several major partnerships with other companies, including Apple Inc. , Twitter, Facebook, Tencent , Cisco , UnderArmour , Box , Microsoft , VMware , CSC , Macy's , Sesame Workshop ,

4508-421: The company more manageable and to streamline IBM by having other investors finance those companies. These included AdStar , dedicated to disk drives and other data storage products; IBM Application Business Systems, dedicated to mid-range computers; IBM Enterprise Systems, dedicated to mainframes; Pennant Systems, dedicated to mid-range and large printers; Lexmark , dedicated to small printers; and more. Lexmark

4600-476: The current version of Python requires an option to open() to read/write UTF-8, plans exist to make UTF-8 I/O the default in Python ;3.15. C++23 adopts UTF-8 as the only portable source code file format (surprisingly there was none before). Backwards compatibility is a serious impediment to changing code and APIs using UTF-16 to use UTF-8, but this is happening. As of May 2019 , Microsoft added

4692-500: The default encoding in XML and HTML (and not just using UTF-8, also declaring it in metadata), "even when all characters are in the ASCII range ... Using non-UTF-8 encodings can have unexpected results". Lots of software has the ability to read/write UTF-8. It may though require the user to change options from the normal settings, or may require a BOM (byte-order mark) as the first character to read

SECTION 50

#1732856069238

4784-440: The detailed meaning of each byte in a stream encoded in UTF-8. Not all sequences of bytes are valid UTF-8. A UTF-8 decoder should be prepared for: Many of the first UTF-8 decoders would decode these, ignoring incorrect bits. Carefully crafted invalid UTF-8 could make them either skip or create ASCII characters such as NUL , slash, or quotes, leading to security vulnerabilities. It is also common to throw an exception or truncate

4876-470: The early 1930s. This equipment was critical to Nazi efforts to categorize citizens of both Germany and other nations that fell under Nazi control through ongoing censuses. These census data were used to facilitate the round-up of Jews and other targeted groups, and to catalog their movements through the machinery of the Holocaust , including internment in the concentration camps. Nazi concentration camps operated

4968-451: The early days of Unicode there were no characters greater than U+FFFF and combining characters were rarely used, so the 16-bit encoding was fixed-size. This made processing of text more efficient, though the gains are nowhere as great as novice programmers may imagine. All such advantages were lost as soon as UTF-16 became variable width as well. The code points U+0800 – U+FFFF take 3 bytes in UTF-8 but only 2 in UTF-16. This led to

5060-548: The end of 2017, as CEO of Kyndryl. In 2021, IBM announced the acquisition of the enterprise software company Turbonomic for $ 1.5 billion. In January 2022, IBM announced it would sell Watson Health to private equity firm Francisco Partners . On March 7, 2022, a few days after the start of the Russian invasion of Ukraine , IBM CEO Arvind Krishna published a Ukrainian flag and announced that "we have suspended all business in Russia". All Russian articles were also removed from

5152-445: The enterprise-oriented Personal Systems Group of the IBM PC Co. into IBM's own Global Services personal computer consulting and customer service division. The resulting merged business units then became known simply as IBM Personal Systems Group. A year later, IBM stopped selling their computers at retail outlets after their market share in this sector had fallen considerably behind competitors Compaq and Dell . Immediately afterwards,

5244-426: The file. Examples of software supporting UTF-8 include Microsoft Word , Microsoft Excel (2016 and later), Google Drive , LibreOffice and most databases. Software that "defaults" to UTF-8 (meaning it writes it without the user changing settings, and it reads it without a BOM) has become more common since 2010. Windows Notepad , in all currently supported versions of Windows, defaults to writing UTF-8 without

5336-579: The first character is a BOM (or the file only contains ASCII). For a long time there was considerable argument as to whether it was better to process text in UTF-16 or in UTF-8. The primary advantage of UTF-16 is that the Windows API required it to be used to get access to all Unicode characters (only recently has this been fixed). This caused several libraries such as Qt to also use UTF-16 strings which propagates this requirement to non-Windows platforms. In

5428-485: The first time to upgrade to models with greater computing capability without having to rewrite their applications. It was followed by the IBM System/370 in 1970. Together the 360 and 370 made the IBM mainframe the dominant mainframe computer and the dominant computing platform in the industry throughout this period and into the early 1980s. They and the operating systems that ran on them such as OS/VS1 and MVS , and

5520-632: The high and low surrogates used by UTF-16 ( U+D800 through U+DFFF ) are not legal Unicode values, and their UTF-8 encodings must be treated as an invalid byte sequence. These encodings all start with 0xED followed by 0xA0 or higher. This rule is often ignored as surrogates are allowed in Windows filenames and this means there must be a way to store them in a string. UTF-8 that allows these surrogate halves has been (informally) called WTF-8 , while another variation that also encodes all non-BMP characters as two surrogates (6 bytes instead of 4)

5612-451: The high bit was set. The name File System Safe UCS Transformation Format ( FSS-UTF ) and most of the text of this proposal were later preserved in the final specification. In August 1992, this proposal was circulated by an IBM X/Open representative to interested parties. A modification by Ken Thompson of the Plan 9 operating system group at Bell Labs made it self-synchronizing , letting

SECTION 60

#1732856069238

5704-522: The idea that text in Chinese and other languages would take more space in UTF-8. However, text is only larger if there are more of these code points than 1-byte ASCII code points, and this rarely happens in the real-world documents due to spaces, newlines, digits, punctuation, English words, and HTML markup. UTF-8 has the advantages of being trivial to retrofit to any system that could handle an extended ASCII , not having byte-order problems, and taking about 1/2

5796-731: The larger ones. In New York City, IBM has several offices besides CHQ, including the IBM Watson headquarters at Astor Place in Manhattan. Outside of New York, major campuses in the United States include Austin, Texas ; Research Triangle Park (Raleigh-Durham), North Carolina ; Rochester, Minnesota ; and Silicon Valley, California . IBM's real estate holdings are varied and globally diverse. Towers occupied by IBM include 1250 René-Lévesque (Montreal, Canada) and One Atlantic Center (Atlanta, Georgia, US). In Beijing, China, IBM occupies Pangu Plaza ,

5888-441: The last byte of a code point to decode it. Unlike many earlier multi-byte text encodings such as Shift-JIS , it is self-synchronizing so searches for short strings or characters are possible and that the start of a code point can be found from a random position by backing up at most 3 bytes. The values chosen for the lead bytes means sorting a list of UTF-8 strings puts them in the same order as sorting UTF-32 strings. Using

5980-552: The late 19th century. Julius E. Pitrap patented the computing scale in 1885; Alexander Dey invented the dial recorder (1888); Herman Hollerith patented the Electric Tabulating Machine (1889); and Willard Bundy invented a time clock to record workers' arrival and departure times on a paper tape (1889). On June 16, 1911, their four companies were amalgamated in New York State by Charles Ranlett Flint forming

6072-447: The latest being the IBM z series. The most recent model, the IBM z16 , was released in 2022. In 1990, IBM released the Power microprocessors , which were designed into many console gaming systems, including Xbox 360 , PlayStation 3 , and Nintendo 's Wii U . IBM Secure Blue is encryption hardware that can be built into microprocessors, and in 2014, the company revealed TrueNorth ,

6164-425: The longer term. The key trends of IBM are (as at the financial year ending December 31): The company's 15-member board of directors are responsible for overall corporate management and includes the current or former CEOs of Anthem , Dow Chemical , Johnson and Johnson , Royal Dutch Shell , UPS , and Vanguard as well as the president of Cornell University and a retired U.S. Navy admiral . Vanguard Group

6256-575: The mid-1950s. There are two other IBM buildings within walking distance of CHQ: the North Castle office, which previously served as IBM's headquarters; and the Louis V. Gerstner, Jr., Center for Learning (formerly known as IBM Learning Center (ILC)), a resort hotel and training center, which has 182 guest rooms, 31 meeting rooms, and various amenities. IBM operates in 174 countries as of 2016 , with mobility centers in smaller market areas and major campuses in

6348-600: The middleware built on top of those such as the CICS transaction processing monitor, had a near-monopoly-level market share and became the thing IBM was most known for during this period. In 1969, the United States of America alleged that IBM violated the Sherman Antitrust Act by monopolizing or attempting to monopolize the general-purpose electronic digital computer system market, specifically computers designed primarily for business, and subsequently alleged that IBM violated

6440-400: The parent company of Sesame Street , and Salesforce.com . In 2015, its chip division transitioned to a fabless model with semiconductors design, offloading manufacturing to GlobalFoundries . In 2015, IBM announced three major acquisitions: Merge Healthcare for $ 1 billion, data storage vendor Cleversafe , and all digital assets from The Weather Company , including Weather.com and

6532-401: The present day. The company later also found success in the portable space with the ThinkPad . Since the 1990s, IBM has concentrated on computer services , software , supercomputers , and scientific research ; it sold its microcomputer division to Lenovo in 2005. IBM continues to develop mainframes, and its supercomputers have consistently ranked among the most powerful in the world in

6624-554: The range 0x21–0x7E that meant something else in ASCII, e.g., 0x2F for / , the Unix path directory separator. In July 1992, the X/Open committee XoJIG was looking for a better encoding. Dave Prosser of Unix System Laboratories submitted a proposal for one that had faster implementation characteristics and introduced the improvement that 7-bit ASCII characters would only represent themselves; multi-byte sequences would only include bytes with

6716-411: The record for most annual U.S. patents generated by a business for 29 consecutive years from 1993 to 2021. IBM was founded in 1911 as the Computing-Tabulating-Recording Company (CTR), a holding company of manufacturers of record-keeping and measuring systems. It was renamed "International Business Machines" in 1924 and soon became the leading manufacturer of punch-card tabulating systems . During

6808-484: The remaining 61,440 codepoints of the Basic Multilingual Plane (BMP), including most Chinese, Japanese and Korean characters . Four bytes are needed for the 1,048,576 codepoints in the other planes of Unicode , which include emoji (pictographic symbols), less common CJK characters , various historic scripts, and mathematical symbols . This is a prefix code and it is unnecessary to read past

6900-582: The same modified UTF-8 as Java for internal representation of Unicode data, but uses strict CESU-8 for external data. All known Modified UTF-8 implementations also treat the surrogate pairs as in CESU-8 . Raku programming language (formerly Perl 6) uses utf-8 encoding by default for I/O ( Perl 5 also supports it); though that choice in Raku also implies "normalization into Unicode NFC (normalization form canonical) . In some cases you may want to ensure no normalization

6992-465: The searched-for string does not contain any errors. Making each byte be an error, in which case E1,A0,20 is two errors followed by a space, also still allows searching for a valid string. This means there are only 128 different errors which makes it practical to store the errors in the output string, or replace them with characters from a legacy encoding. Only a small subset of possible byte strings are error-free UTF-8: several bytes cannot appear;

7084-493: The space for any language using mostly Latin letters. UTF-8 has been the most common encoding for the World Wide Web since 2008. As of November 2024 , UTF-8 is used by 98.4% of surveyed web sites. Although many pages only use ASCII characters to display content, very few websites now declare their encoding to only be ASCII instead of UTF-8. Virtually all countries and languages have 95% or more use of UTF-8 encodings on

7176-624: The specification for FSS-UTF. UTF-8 was first officially presented at the USENIX conference in San Diego , from January 25 to 29, 1993. The Internet Engineering Task Force adopted UTF-8 in its Policy on Character Sets and Languages in RFC ;2277 ( BCP 18) for future internet standards work in January 1998, replacing Single Byte Character Sets such as Latin-1 in older RFCs. In November 2003, UTF-8

7268-645: The string at an error but this turns what would otherwise be harmless errors (i.e. "file not found") into a denial of service , for instance early versions of Python 3.0 would exit immediately if the command line or environment variables contained invalid UTF-8. RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences." The Unicode Standard requires decoders to: "... treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence." The standard now recommends replacing each error with

7360-532: The time the biggest in American corporate history. Lou Gerstner was hired as CEO from RJR Nabisco to turn the company around. In 1995, IBM purchased Lotus Software , best known for its Lotus 1-2-3 spreadsheet software. During the decade, IBM was working on a new operating system, named the Workplace OS project. Despite a large amount of money spent on the project, it was cancelled in 1996. In 1998, IBM merged

7452-500: The uses of the volatile keyword have been deprecated. In addition to keywords, there are identifiers with special meaning , including new import and module . New attributes in C++20: [[likely]] , [[unlikely]] , and [[no_unique_address]] Removed features: Deprecated features: Full support Microsoft's compiler supports not only Windows but also Linux, Android, and iOS. However, for Linux development, it requires

7544-608: The vacuum tube based IBM 701 , in 1952. The IBM 305 RAMAC introduced the hard disk drive in 1956. The company switched to transistorized designs with the 7000 and 1400 series, beginning in 1958. In which, IBM considered the 1400 series the ''model T'' of computing, due to it being the first computer with over ten thousand sales by IBM. In 1956, the company demonstrated the first practical example of artificial intelligence when Arthur L. Samuel of IBM's Poughkeepsie , New York, laboratory programmed an IBM 704 not merely to play checkers but "learn" from its own experience. In 1957,

7636-543: The value of the code point. In the following table, the characters u to z are replaced by the bits of the code point, from the positions U+uvwxyz : The first 128 code points (ASCII) need 1 byte. The next 1,920 code points need two bytes to encode, which covers the remainder of almost all Latin-script alphabets , and also IPA extensions , Greek , Cyrillic , Coptic , Armenian , Hebrew , Arabic , Syriac , Thaana and N'Ko alphabets, as well as Combining Diacritical Marks . Three bytes are needed for

7728-580: The web. Many standards only support UTF-8, e.g. JSON exchange requires it (without a byte-order mark (BOM)). UTF-8 is also the recommendation from the WHATWG for HTML and DOM specifications, and stating "UTF-8 encoding is the most appropriate encoding for interchange of Unicode " and the Internet Mail Consortium recommends that all e‑mail programs be able to display and create mail using UTF-8. The World Wide Web Consortium recommends UTF-8 as

7820-476: Was acquired by Clayton & Dubilier in a leveraged buyout shortly after its formation. In September 1992, IBM completed the spin-off of their various non-mainframe and non-midrange, personal computer manufacturing divisions, combining them into an autonomous wholly owned subsidiary known as the IBM Personal Computer Company (IBM PC Co.). This corporate restructuring came after IBM reported

7912-400: Was completely out of IBM. IBM is headquartered in Armonk, New York , a community 37 miles (60 km) north of Midtown Manhattan. A nickname for the company is the " Colossus of Armonk ". Its principal building, referred to as CHQ, is a 283,000-square-foot (26,300 m ) glass and stone edifice on a 25-acre (10 ha) parcel amid a 432-acre former apple orchard the company purchased in

8004-497: Was designed for backward compatibility with ASCII : the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file. Most software designed for any extended ASCII can read and write UTF-8 (including on Microsoft Windows ) and this results in fewer internationalization issues than any alternative text encoding. UTF-8

8096-400: Was exhibited on Jeopardy! where it won against game-show champions Ken Jennings and Brad Rutter. The company also celebrated its 100th anniversary in the same year on June 16. In 2012, IBM announced it had agreed to buy Kenexa and Texas Memory Systems, and a year later it also acquired SoftLayer Technologies, a web hosting service , in a deal worth around $ 2 billion. Also that year,

8188-633: Was originally part of the joint venture and was sold by LG in 2012. Continuing a trend started in the 1990s of downsizing its operations and divesting from commodity production , IBM sold all of its personal computer business to Chinese technology company Lenovo and, in 2009, it acquired software company SPSS Inc. Later in 2009, IBM's Blue Gene supercomputing program was awarded the National Medal of Technology and Innovation by U.S. President Barack Obama . In 2011, IBM gained worldwide attention for its artificial intelligence program Watson , which

8280-497: Was recognized with the 1990 Honor Award from the National Building Museum . IBM has a large and diverse portfolio of products and services. As of 2016 , these offerings fall into the categories of cloud computing , artificial intelligence, commerce , data and analytics , Internet of things (IoT), IT infrastructure , mobile , digital workplace and cybersecurity . Since 1954, IBM sells mainframe computers ,

8372-413: Was restricted by RFC   3629 to match the constraints of the UTF-16 character encoding: explicitly prohibiting code points corresponding to the high and low surrogate characters removed more than 3% of the three-byte sequences, and ending at U+10FFFF removed more than 48% of the four-byte sequences and all five- and six-byte sequences. UTF-8 encodes code points in one to four bytes, depending on

8464-480: Was valued at over $ 153 billion as of May 2024. Despite its relative decline within the technology sector, IBM remains the seventh largest technology company by revenue, and 67th largest overall company by revenue in the United States . IBM ranked No. 38 on the 2020 Fortune 500 rankings of the largest United States corporations by total revenue. In 2014, IBM was accused of using "financial engineering" to hit its quarterly earnings targets rather than investing for

#237762