87-618: The Brahmic scripts , also known as Indic scripts , are a family of abugida writing systems . They are used throughout the Indian subcontinent , Southeast Asia and parts of East Asia . They are descended from the Brahmi script of ancient India and are used by various languages in several language families in South , East and Southeast Asia : Indo-Aryan , Dravidian , Tibeto-Burman , Mongolic , Austroasiatic , Austronesian , and Tai . They were also
174-498: A block is always a multiple of 16, and is often a multiple of 128, but is otherwise arbitrary. Characters required for a given script may be spread out over several different, potentially disjunct blocks within the codespace. Each code point is assigned a classification, listed as the code point's General Category property. Here, at the uppermost level code points are categorized as one of Letter, Mark, Number, Punctuation, Symbol, Separator, or Other. Under each category, each code point
261-710: A calendar year and with rare cases where the scheduled release had to be postponed. For instance, in April 2020, a month after version 13.0 was published, the Unicode Consortium announced they had changed the intended release date for version 14.0, pushing it back six months to September 2021 due to the COVID-19 pandemic . Unicode 16.0, the latest version, was released on 10 September 2024. It added 5,185 characters and seven new scripts: Garay , Gurung Khema , Kirat Rai , Ol Onal , Sunuwar , Todhri , and Tulu-Tigalari . Thus far,
348-413: A change to writing the two consonants side by side. In the latter case, this combination may be indicated by a diacritic on one of the consonants or a change in the form of one of the consonants, e.g. the half forms of Devanagari. Generally, the reading order of stacked consonants is top to bottom, or the general reading order of the script, but sometimes the reading order can be reversed. The division of
435-432: A comprehensive catalog of character properties, including those needed for supporting bidirectional text , as well as visual charts and reference data sets to aid implementers. Previously, The Unicode Standard was sold as a print volume containing the complete core specification, standard annexes, and code charts. However, version 5.0, published in 2006, was the last version printed this way. Starting with version 5.2, only
522-453: A conjunct. This expedient is used by ISCII and South Asian scripts of Unicode .) Thus a closed syllable such as phaṣ requires two aksharas to write: फष् phaṣ . The Róng script used for the Lepcha language goes further than other Indic abugidas, in that a single akshara can represent a closed syllable: Not only the vowel, but any final consonant is indicated by a diacritic. For example,
609-473: A default vowel consonant such as फ does not take on a final consonant sound. Instead, it keeps its vowel. For writing two consonants without a vowel in between, instead of using diacritics on the first consonant to remove its vowel, another popular method of special conjunct forms is used in which two or more consonant characters are merged to express a cluster, such as Devanagari, as in अप्फ appha. (Some fonts display this as प् followed by फ, rather than forming
696-515: A diacritic, but writes all other vowels as full letters (similarly to Kurdish and Uyghur). This means that when no vowel diacritics are present (most of the time), it technically has an inherent vowel. However, like the Phagspa and Meroitic scripts whose status as abugidas is controversial (see below), all other vowels are written in-line. Additionally, the practice of explicitly writing all-but-one vowel does not apply to loanwords from Arabic and Persian, so
783-566: A full semantic duplicate of the Latin alphabet, because legacy CJK encodings contained both "fullwidth" (matching the width of CJK characters) and "halfwidth" (matching ordinary Latin script) characters. The Unicode Bulldog Award is given to people deemed to be influential in Unicode's development, with recipients including Tatsuo Kobayashi , Thomas Milo, Roozbeh Pournader , Ken Lunde , and Michael Everson . The origins of Unicode can be traced back to
870-429: A handful of scripts—often primarily between a given script and Latin characters —not between a large number of scripts, and not with all of the scripts supported being treated in a consistent manner. The philosophy that underpins Unicode seeks to encode the underlying characters— graphemes and grapheme-like units—rather than graphical distinctions considered mere variant glyphs thereof, that are instead best handled by
957-436: A letter modified to indicate the vowel. Letters can be modified either by means of diacritics or by changes in the form of the letter itself. If all modifications are by diacritics and all diacritics follow the direction of the writing of the letters, then the abugida is not an alphasyllabary. However, most languages have words that are more complicated than a sequence of CV syllables, even ignoring tone. The first complication
SECTION 10
#17328545169941044-465: A letter representing just a consonant (C). This final consonant may be represented with: In a true abugida, the lack of distinctive vowel marking of the letter may result from the diachronic loss of the inherent vowel, e.g. by syncope and apocope in Hindi . When not separating syllables containing consonant clusters (CCV) into C + CV, these syllables are often written by combining the two consonants. In
1131-530: A low-surrogate code point forms a surrogate pair in UTF-16 in order to represent code points greater than U+FFFF . In principle, these code points cannot otherwise be used, though in practice this rule is often ignored, especially when not using UTF-16. A small set of code points are guaranteed never to be assigned to characters, although third-parties may make independent use of them at their discretion. There are 66 of these noncharacters : U+FDD0 – U+FDEF and
1218-413: A particular vowel, and in which diacritics denote other vowels". (This 'particular vowel' is referred to as the inherent or implicit vowel, as opposed to the explicit vowels marked by the 'diacritics'.) An alphasyllabary is defined as "a type of writing system in which the vowels are denoted by subsidiary symbols, not all of which occur in a linear order (with relation to the consonant symbols) that
1305-526: A project run by Deborah Anderson at the University of California, Berkeley was founded in 2002 with the goal of funding proposals for scripts not yet encoded in the standard. The project has become a major source of proposed additions to the standard in recent years. The Unicode Consortium together with the ISO have developed a shared repertoire following the initial publication of The Unicode Standard : Unicode and
1392-399: A properly engineered design, 16 bits per character are more than sufficient for this purpose. This design decision was made based on the assumption that only scripts and characters in "modern" use would require encoding: Unicode gives higher priority to ensuring utility for the future than to preserving past antiquities. Unicode aims in the first instance at the characters published in
1479-416: A result of the spread of writing systems, independent vowels may be used to represent syllables beginning with a glottal stop , even for non-initial syllables. The next two complications are consonant clusters before a vowel (CCV) and syllables ending in a consonant (CVC). The simplest solution, which is not always available, is to break with the principle of writing words as a sequence of syllables and use
1566-409: A segmental writing system in which consonant–vowel sequences are written as units; each unit is based on a consonant letter, and vowel notation is secondary, similar to a diacritical mark . This contrasts with a full alphabet , in which vowels have status equal to consonants, and with an abjad , in which vowel marking is absent, partial , or optional – in less formal contexts, all three types of
1653-497: A syllable with the default vowel, in this case ka ( [kə] ). In some languages, including Hindi, it becomes a final closing consonant at the end of a word, in this case k . The inherent vowel may be changed by adding vowel mark ( diacritics ), producing syllables such as कि ki, कु ku, के ke, को ko. In many of the Brahmic scripts, a syllable beginning with a cluster is treated as a single character for purposes of vowel marking, so
1740-439: A term in linguistics was proposed by Peter T. Daniels in his 1990 typology of writing systems . As Daniels used the word, an abugida is in contrast with a syllabary , where letters with shared consonant or vowel sounds show no particular resemblance to one another. Furthermore, an abugida is also in contrast with an alphabet proper, where independent letters are used to denote consonants and vowels. The term alphasyllabary
1827-558: A total of 168 scripts are included in the latest version of Unicode (covering alphabets , abugidas and syllabaries ), although there are still scripts that are not yet encoded, particularly those mainly used in historical, liturgical, and academic contexts. Further additions of characters to the already encoded scripts, as well as symbols, in particular for mathematics and music (in the form of notes and rhythmic symbols), also occur. The Unicode Roadmap Committee ( Michael Everson , Rick McGowan, Ken Whistler, V.S. Umamaheswaran) maintain
SECTION 20
#17328545169941914-648: A universal encoding than the original Unicode architecture envisioned. Version 1.0 of Microsoft's TrueType specification, published in 1992, used the name "Apple Unicode" instead of "Unicode" for the Platform ID in the naming table. The Unicode Consortium is a nonprofit organization that coordinates Unicode's development. Full members include most of the main computer software and hardware companies (and few others) with any interest in text-processing standards, including Adobe , Apple , Google , IBM , Meta (previously as Facebook), Microsoft , Netflix , and SAP . Over
2001-420: A vowel can be written before, below or above a consonant letter, while the syllable is still pronounced in the order of a consonant-vowel combination (CV). The fundamental principles of an abugida apply to words made up of consonant-vowel (CV) syllables. The syllables are written as letters in a straight line, where each syllable is either a letter that represents the sound of a consonant and its inherent vowel or
2088-466: A vowel marker like ि -i, falling before the character it modifies, may appear several positions before the place where it is pronounced. For example, the game cricket in Hindi is क्रिकेट krikeṭ ; the diacritic for /i/ appears before the consonant cluster /kr/ , not before the /r/ . A more unusual example is seen in the Batak alphabet : Here the syllable bim is written ba-ma-i-(virama) . That is,
2175-403: A word into syllables for the purposes of writing does not always accord with the natural phonetics of the language. For example, Brahmic scripts commonly handle a phonetic sequence CVC-CV as CV-CCV or CV-C-CV. However, sometimes phonetic CVC syllables are handled as single units, and the final consonant may be represented: More complicated unit structures (e.g. CC or CCVC) are handled by combining
2262-475: Is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 of the standard defines 154 998 characters and 168 scripts used in various ordinary, literary, academic, and technical contexts. Many common characters, including numerals, punctuation, and other symbols, are unified within
2349-463: Is a non-segmental script that indicates syllable onsets and rimes , such as consonant clusters and vowels with final consonants. Thus it is not segmental and cannot be considered an abugida. However, it superficially resembles an abugida with the roles of consonant and vowel reversed. Most syllables are written with two letters in the order rime–onset (typically vowel-consonant), even though they are pronounced as onset-rime (consonant-vowel), rather like
2436-499: Is congruent with their temporal order in speech". Bright did not require that an alphabet explicitly represent all vowels. ʼPhags-pa is an example of an abugida because it has an inherent vowel , but it is not an alphasyllabary because its vowels are written in linear order. Modern Lao is an example of an alphasyllabary that is not an abugida, for there is no inherent vowel and its vowels are always written explicitly and not in accordance to their temporal order in speech, meaning that
2523-490: Is difficult to draw a dividing line between abugidas and other segmental scripts. For example, the Meroitic script of ancient Sudan did not indicate an inherent a (one symbol stood for both m and ma, for example), and is thus similar to Brahmic family of abugidas. However, the other vowels were indicated with full letters, not diacritics or modification, so the system was essentially an alphabet that did not bother to write
2610-413: Is intended to suggest a unique, unified, universal encoding". In this document, entitled Unicode 88 , Becker outlined a scheme using 16-bit characters: Unicode is intended to address the need for a workable, reliable world text encoding. Unicode could be roughly described as "wide-body ASCII " that has been stretched to 16 bits to encompass the characters of all the world's living languages. In
2697-428: Is more than just a repertoire within which characters are assigned. To aid developers and designers, the standard also provides charts and reference data, as well as annexes explaining concepts germane to various scripts, providing guidance for their implementation. Topics covered by these annexes include character normalization , character composition and decomposition, collation , and directionality . Unicode text
Brahmic scripts - Misplaced Pages Continue
2784-453: Is not padded. There are a total of 2 + (2 − 2 ) = 1 112 064 valid code points within the codespace. (This number arises from the limitations of the UTF-16 character encoding, which can encode the 2 code points in the range U+0000 through U+FFFF except for the 2 code points in the range U+D800 through U+DFFF , which are used as surrogate pairs to encode the 2 code points in
2871-417: Is processed and stored as binary data using one of several encodings , which define how to translate the standard's abstracted codes for characters into sequences of bytes. The Unicode Standard itself defines three encodings: UTF-8 , UTF-16 , and UTF-32 , though several others exist. Of these, UTF-8 is the most widely used by a large margin, in part due to its backwards-compatibility with ASCII . Unicode
2958-480: Is projected to include 4301 new unified CJK characters . The Unicode Standard defines a codespace : a sequence of integers called code points in the range from 0 to 1 114 111 , notated according to the standard as U+0000 – U+10FFFF . The codespace is a systematic, architecture-independent representation of The Unicode Standard ; actual text is processed as binary data via one of several Unicode encodings, such as UTF-8 . In this normative notation,
3045-476: Is syllables that consist of just a vowel (V). For some languages, a zero consonant letter is used as though every syllable began with a consonant. For other languages, each vowel has a separate letter that is used for each syllable consisting of just the vowel. These letters are known as independent vowels , and are found in most Indic scripts. These letters may be quite different from the corresponding diacritics, which by contrast are known as dependent vowels . As
3132-481: Is the case for syllabaries, the units of the writing system may consist of the representations both of syllables and of consonants. For scripts of the Brahmic family, the term akshara is used for the units. In several languages of Ethiopia and Eritrea, abugida traditionally meant letters of the Ethiopic or Ge‘ez script in which many of these languages are written. Ge'ez is one of several segmental writing systems in
3219-400: Is then further subcategorized. In most cases, other properties must be used to adequately describe all the characteristics of any given code point. The 1024 points in the range U+D800 – U+DBFF are known as high-surrogate code points, and code points in the range U+DC00 – U+DFFF ( 1024 code points) are known as low-surrogate code points. A high-surrogate code point followed by
3306-657: Is with North Indic scripts, used in Northern India, Nepal, Tibet, Bhutan, Mongolia, and Russia; and Southern Indic scripts, used in South India , Sri Lanka and Southeast Asia . South Indic letter forms are more rounded than North Indic forms, though Odia , Golmol and Litumol of Nepal script are rounded. Most North Indic scripts' full letters incorporate a horizontal line at the top, with Gujarati and Odia as exceptions; South Indic scripts do not. Indic scripts indicate vowels through dependent vowel signs (diacritics) around
3393-593: The Brahmi alphabet . Today they are used in most languages of South Asia (although replaced by Perso-Arabic in Urdu , Kashmiri and some other languages of Pakistan and India ), mainland Southeast Asia ( Myanmar , Thailand , Laos , Cambodia , and Vietnam ), Tibet ( Tibetan ), Indonesian archipelago ( Javanese , Balinese , Sundanese , Batak , Lontara , Rejang , Rencong , Makasar , etc.), Philippines ( Baybayin , Buhid , Hanunuo , Kulitan , and Aborlan Tagbanwa ), Malaysia ( Rencong ). The primary division
3480-487: The Ge'ez abugida (or fidel ), the base form of the letter (also known as fidel ) may be altered. For example, ሀ hä [hə] (base form), ሁ hu (with a right-side diacritic that does not alter the letter), ሂ hi (with a subdiacritic that compresses the consonant, so it is the same height), ህ hə [hɨ] or [h] (where the letter is modified with a kink in the left arm). In the family known as Canadian Aboriginal syllabics , which
3567-677: The Kadamba , Pallava and Vatteluttu scripts, which in turn diversified into other scripts of South India and Southeast Asia. Brahmic scripts spread in a peaceful manner, Indianization , or the spread of Indian learning. The scripts spread naturally to Southeast Asia, at ports on trading routes. At these trading posts, ancient inscriptions have been found in Sanskrit, using scripts that originated in India. At first, inscriptions were made in Indian languages, but later
Brahmic scripts - Misplaced Pages Continue
3654-574: The Kharoṣṭhī and Brāhmī scripts ; the abjad in question is usually considered to be the Aramaic one, but while the link between Aramaic and Kharosthi is more or less undisputed, this is not the case with Brahmi. The Kharosthi family does not survive today, but Brahmi's descendants include most of the modern scripts of South and Southeast Asia . Ge'ez derived from a different abjad, the Sabean script of Yemen ;
3741-527: The aksharas ; there is no vowel-killer mark. Abjads are typically written without indication of many vowels. However, in some contexts like teaching materials or scriptures , Arabic and Hebrew are written with full indication of vowels via diacritic marks ( harakat , niqqud ) making them effectively alphasyllabaries. The Arabic scripts used for Kurdish in Iraq and for Uyghur in Xinjiang , China, as well as
3828-568: The typeface , through the use of markup , or by some other means. In particularly complex cases, such as the treatment of orthographical variants in Han characters , there is considerable disagreement regarding which differences justify their own encodings, and which are only graphical variants of other characters. At the most abstract level, Unicode assigns a unique number called a code point to each character. Many issues of visual representation—including size, shape, and style—are intended to be up to
3915-574: The 1980s, to a group of individuals with connections to Xerox 's Character Code Standard (XCCS). In 1987, Xerox employee Joe Becker , along with Apple employees Lee Collins and Mark Davis , started investigating the practicalities of creating a universal character set. With additional input from Peter Fenwick and Dave Opstad , Becker published a draft proposal for an "international/multilingual text character encoding system in August 1988, tentatively called Unicode". He explained that "the name 'Unicode'
4002-598: The 7th or 8th century, include Nagari , Siddham and Sharada . The Siddhaṃ script was especially important in Buddhism , as many sutras were written in it. The art of Siddham calligraphy survives today in Japan . The tabular presentation and dictionary order of the modern kana system of Japanese writing is believed to be descended from the Indic scripts, most likely through the spread of Buddhism . Southern Brahmi evolved into
4089-508: The Hebrew script of Yiddish , are fully vowelled, but because the vowels are written with full letters rather than diacritics (with the exception of distinguishing between /a/ and /o/ in the latter) and there are no inherent vowels, these are considered alphabets, not abugidas. The Arabic script used for South Azerbaijani generally writes the vowel /æ/ (written as ə in North Azerbaijani) as
4176-564: The ISO's Universal Coded Character Set (UCS) use identical character names and code points. However, the Unicode versions do differ from their ISO equivalents in two significant ways. While the UCS is a simple character map, Unicode specifies the rules, algorithms, and properties necessary to achieve interoperability between different platforms and languages. Thus, The Unicode Standard includes more information, covering in-depth topics such as bitwise encoding, collation , and rendering. It also provides
4263-624: The Indic scripts, the earliest method was simply to arrange them vertically, writing the second consonant of the cluster below the first one. The two consonants may also merge as conjunct consonant letters, where two or more letters are graphically joined in a ligature , or otherwise change their shapes. Rarely, one of the consonants may be replaced by a gemination mark, e.g. the Gurmukhi addak . When they are arranged vertically, as in Burmese or Khmer , they are said to be 'stacked'. Often there has been
4350-497: The advent of vowels coincided with the introduction or adoption of Christianity about AD 350. The Ethiopic script is the elaboration of an abjad. The Cree syllabary was invented with full knowledge of the Devanagari system. The Meroitic script was developed from Egyptian hieroglyphs , within which various schemes of 'group writing' had been used for showing vowels. Unicode Unicode , formally The Unicode Standard ,
4437-463: The consonants to the point that they must be considered modifications of the form of the letters. Children learn each modification separately, as in a syllabary; nonetheless, the graphic similarities between syllables with the same consonant are readily apparent, unlike the case in a true syllabary . Though now an abugida, the Ge'ez script , until the advent of Christianity ( ca. AD 350 ), had originally been what would now be termed an abjad . In
SECTION 50
#17328545169944524-476: The consonants, often including a sign that explicitly indicates the lack of a vowel. If a consonant has no vowel sign, this indicates a default vowel. Vowel diacritics may appear above, below, to the left, to the right, or around the consonant. The most widely used Indic script is Devanagari , shared by Hindi , Bihari , Marathi , Konkani , Nepali , and often Sanskrit . A basic letter such as क in Hindi represents
4611-496: The core specification, published as a print-on-demand paperback, may be purchased. The full text, on the other hand, is published as a free PDF on the Unicode website. A practical reason for this publication method highlights the second significant difference between the UCS and Unicode—the frequency with which updated versions are released and new characters added. The Unicode Standard has regularly released annual expanded versions, occasionally with more than one version released in
4698-470: The discretion of the software actually rendering the text, such as a web browser or word processor . However, partially with the intent of encouraging rapid adoption, the simplicity of this original model has become somewhat more elaborate over time, and various pragmatic concessions have been made over the course of the standard's development. The first 256 code points mirror the ISO/IEC 8859-1 standard, with
4785-457: The examples above to sets of syllables in the Japanese hiragana syllabary: か ka , き ki , く ku , け ke , こ ko have nothing in common to indicate k; while ら ra , り ri , る ru , れ re , ろ ro have neither anything in common for r , nor anything to indicate that they have the same vowels as the k set. Most Indian and Indochinese abugidas appear to have first been developed from abjads with
4872-401: The following versions of The Unicode Standard have been published. Update versions, which do not include any changes to character repertoire, are signified by the third number (e.g., "version 4.0.1") and are omitted in the table below. The Unicode Consortium normally releases a new version of The Unicode Standard once a year. Version 17.0, the next major version,
4959-516: The group. By the end of 1990, most of the work of remapping existing standards had been completed, and a final review draft of Unicode was ready. The Unicode Consortium was incorporated in California on 3 January 1991, and the first volume of The Unicode Standard was published that October. The second volume, now adding Han ideographs, was published in June 1992. In 1996, a surrogate character mechanism
5046-549: The intent of trivializing the conversion of text already written in Western European scripts. To preserve the distinctions made by different legacy encodings, therefore allowing for conversion between them and Unicode without any loss of information, many characters nearly identical to others , in both appearance and intended function, were given distinct code points. For example, the Halfwidth and Fullwidth Forms block encompasses
5133-403: The last two code points in each of the 17 planes (e.g. U+FFFE , U+FFFF , U+1FFFE , U+1FFFF , ..., U+10FFFE , U+10FFFF ). The set of noncharacters is stable, and no new noncharacters will ever be defined. Like surrogates, the rule that these cannot be used is often ignored, although the operation of the byte order mark assumes that U+FFFE will never be the first code point in
5220-625: The list of scripts that are candidates or potential candidates for encoding and their tentative code block assignments on the Unicode Roadmap page of the Unicode Consortium website. For some scripts on the Roadmap, such as Jurchen and Khitan large script , encoding proposals have been made and they are working their way through the approval process. For other scripts, such as Numidian and Rongorongo , no proposal has yet been made, and they await agreement on character repertoire and other details from
5307-675: The modern text (e.g. in the union of all newspapers and magazines printed in the world in 1988), whose number is undoubtedly far below 2 = 16,384. Beyond those modern-use characters, all others may be defined to be obsolete or rare; these are better candidates for private-use registration than for congesting the public list of generally useful Unicode. In early 1989, the Unicode working group expanded to include Ken Whistler and Mike Kernaghan of Metaphor, Karen Smith-Yoshimura and Joan Aliprand of Research Libraries Group , and Glenn Wright of Sun Microsystems . In 1990, Michel Suignard and Asmus Freytag of Microsoft and NeXT 's Rick McGowan had also joined
SECTION 60
#17328545169945394-405: The most common vowel. Several systems of shorthand use diacritics for vowels, but they do not have an inherent vowel, and are thus more similar to Thaana and Kurdish script than to the Brahmic scripts. The Gabelsberger shorthand system and its derivatives modify the following consonant to represent vowels. The Pollard script , which was based on shorthand, also uses diacritics for vowels;
5481-423: The placements of the vowel relative to the consonant indicates tone . Pitman shorthand uses straight strokes and quarter-circle marks in different orientations as the principal "alphabet" of consonants; vowels are shown as light and heavy dots, dashes and other marks in one of 3 possible positions to indicate the various vowel-sounds. However, to increase writing speed, Pitman has rules for "vowel indication" using
5568-401: The position of the /i/ vowel in Devanagari, which is written before the consonant. Pahawh is also unusual in that, while an inherent rime /āu/ (with mid tone) is unwritten, it also has an inherent onset /k/ . For the syllable /kau/ , which requires one or the other of the inherent sounds to be overt, it is /au/ that is written. Thus it is the rime (vowel) that is basic to the system. It
5655-553: The positioning or choice of consonant signs so that writing vowel-marks can be dispensed with. As the term alphasyllabary suggests, abugidas have been considered an intermediate step between alphabets and syllabaries . Historically, abugidas appear to have evolved from abjads (vowelless alphabets). They contrast with syllabaries, where there is a distinct symbol for each syllable or consonant-vowel combination, and where these have no systematic similarity to each other, and typically develop directly from logographic scripts . Compare
5742-554: The previous environment of a myriad of incompatible character sets , each used within different locales and on different computer architectures. Unicode is used to encode the vast majority of text on the Internet, including most web pages , and relevant Unicode support has become a common consideration in contemporary software development. The Unicode character repertoire is synchronized with ISO/IEC 10646 , each being code-for-code identical with one another. However, The Unicode Standard
5829-807: The range U+10000 through U+10FFFF .) The Unicode codespace is divided into 17 planes , numbered 0 to 16. Plane 0 is the Basic Multilingual Plane (BMP), and contains the most commonly used characters. All code points in the BMP are accessed as a single code unit in UTF-16 encoding and can be encoded in one, two or three bytes in UTF-8. Code points in planes 1 through 16 (the supplementary planes ) are accessed as surrogate pairs in UTF-16 and encoded in four bytes in UTF-8 . Within each plane, characters are allocated within named blocks of related characters. The size of
5916-468: The same Brahmi glyph. Accordingly: The transliteration is indicated in ISO 15919 . Vowels are presented in their independent form on the left of each column, and in their corresponding dependent form (vowel sign) combined with the consonant k on the right. A glyph for ka is an independent consonant letter itself without any vowel sign, where the vowel a is inherent . Notes Notes The Brahmi script
6003-467: The script does not have an inherent vowel for Arabic and Persian words. The inconsistency of its vowel notation makes it difficult to categorize. The imperial Mongol script called Phagspa was derived from the Tibetan abugida, but all vowels are written in-line rather than as diacritics. However, it retains the features of having an inherent vowel /a/ and having distinct initial vowel letters. Pahawh Hmong
6090-503: The script may be termed "alphabets". The terms also contrast them with a syllabary , in which a single symbol denotes the combination of one consonant and one vowel. Related concepts were introduced independently in 1948 by James Germain Février (using the term néosyllabisme ) and David Diringer (using the term semisyllabary ), then in 1959 by Fred Householder (introducing the term pseudo-alphabet ). The Ethiopic term "abugida"
6177-430: The scripts were used to write the local Southeast Asian languages. Hereafter, local varieties of the scripts were developed. By the 8th century, the scripts had diverged and separated into regional scripts. Some characteristics, which are present in most but not all the scripts, are: Below are comparison charts of several of the major Indic scripts, organised on the principle that glyphs in the same column all derive from
6264-512: The source of the dictionary order ( gojūon ) of Japanese kana . Brahmic scripts descended from the Brahmi script . Brahmi is clearly attested from the 3rd century BCE during the reign of Ashoka , who used the script for imperial edicts . Northern Brahmi gave rise to the Gupta script during the Gupta period , which in turn diversified into a number of cursives during the medieval period . Notable examples of such medieval scripts, developed by
6351-530: The southern group the Vatteluttu and Kadamba / Pallava scripts with the spread of Buddhism sent Brahmic scripts throughout Southeast Asia. As of Unicode version 16.0, the following Brahmic scripts have been encoded: Abugida An abugida ( / ˌ ɑː b uː ˈ ɡ iː d ə , ˌ æ b -/ ; from Ge'ez : አቡጊዳ , 'äbugīda ) – sometimes also called alphasyllabary , neosyllabary , or pseudo-alphabet – is
6438-493: The standard and are not treated as specific to any given writing system. Unicode encodes 3790 emoji , with the continued development thereof conducted by the Consortium as a part of the standard. Moreover, the widespread adoption of Unicode was in large part responsible for the initial popularization of emoji outside of Japan. Unicode is ultimately capable of encoding more than 1.1 million characters. Unicode has largely supplanted
6525-435: The syllable [sok] would be written as something like s̥̽, here with an underring representing /o/ and an overcross representing the diacritic for final /k/ . Most other Indic abugidas can only indicate a very limited set of final consonants with diacritics, such as /ŋ/ or /r/ , if they can indicate any at all. In Ethiopic or Ge'ez script , fidels (individual "letters" of the script) have "diacritics" that are fused with
6612-418: The two-character prefix U+ always precedes a written code point, and the code points themselves are written as hexadecimal numbers. At least four hexadecimal digits are always written, with leading zeros prepended as needed. For example, the code point U+00F7 ÷ DIVISION SIGN is padded with two leading zeros, but U+13254 𓉔 EGYPTIAN HIEROGLYPH O004 ( [REDACTED] )
6699-607: The user communities involved. Some modern invented scripts which have not yet been included in Unicode (e.g., Tengwar ) or which do not qualify for inclusion in Unicode due to lack of real-world use (e.g., Klingon ) are listed in the ConScript Unicode Registry , along with unofficial but widely used Private Use Areas code assignments. There is also a Medieval Unicode Font Initiative focused on special Latin medieval characters. Part of these proposals has been already included in Unicode. The Script Encoding Initiative,
6786-620: The various techniques above. Examples using the Devanagari script There are three principal families of abugidas, depending on whether vowels are indicated by modifying consonants by diacritics, distortion, or orientation. Lao and Tāna have dependent vowels and a zero vowel sign, but no inherent vowel. Indic scripts originated in India and spread to Southeast Asia , Bangladesh , Sri Lanka , Nepal , Bhutan , Tibet , Mongolia , and Russia . All surviving Indic scripts are descendants of
6873-424: The vowel diacritic and virama are both written after the consonants for the whole syllable. In many abugidas, there is also a diacritic to suppress the inherent vowel, yielding the bare consonant. In Devanagari , प् is p, and फ् is ph . This is called the virāma or halantam in Sanskrit. It may be used to form consonant clusters , or to indicate that a consonant occurs at the end of a word. Thus in Sanskrit,
6960-522: The world, others include Indic/Brahmic scripts and Canadian Aboriginal Syllabics . The word abugida is derived from the four letters, ' ä, bu, gi, and da , in much the same way that abecedary is derived from Latin letters a be ce de , abjad is derived from the Arabic a b j d , and alphabet is derived from the names of the two first letters in the Greek alphabet , alpha and beta . Abugida as
7047-635: The years several countries or government agencies have been members of the Unicode Consortium. Presently only the Ministry of Endowments and Religious Affairs (Oman) is a full member with voting rights. The Consortium has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes, as many of the existing schemes are limited in size and scope and are incompatible with multilingual environments. Unicode currently covers most major writing systems in use today. As of 2024 ,
7134-524: Was already divided into regional variants at the time of the earliest surviving epigraphy around the 3rd century BC. Cursives of the Brahmi script began to diversify further from around the 5th century AD and continued to give rise to new scripts throughout the Middle Ages. The main division in antiquity was between northern and southern Brahmi . In the northern group, the Gupta script was very influential, and in
7221-469: Was chosen as a designation for the concept in 1990 by Peter T. Daniels . In 1992, Faber suggested "segmentally coded syllabically linear phonographic script", and in 1992 Bright used the term alphasyllabary , and Gnanadesikan and Rimzhim, Katz, & Fowler have suggested aksara or āksharik . Abugidas include the extensive Brahmic family of scripts of Tibet, South and Southeast Asia, Semitic Ethiopic scripts, and Canadian Aboriginal syllabics . As
7308-491: Was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. This increased the Unicode codespace to over a million code points, which allowed for the encoding of many historic scripts, such as Egyptian hieroglyphs , and thousands of rarely used or obsolete characters that had not been anticipated for inclusion in the standard. Among these characters are various rarely used CJK characters—many mainly being used in proper names, making them far more necessary for
7395-438: Was inspired by the Devanagari script of India, vowels are indicated by changing the orientation of the syllabogram . Each vowel has a consistent orientation; for example, Inuktitut ᐱ pi, ᐳ pu, ᐸ pa; ᑎ ti, ᑐ tu, ᑕ ta . Although there is a vowel inherent in each, all rotations have equal status and none can be identified as basic. Bare consonants are indicated either by separate diacritics, or by superscript versions of
7482-483: Was originally designed with the intent of transcending limitations present in all text encodings designed up to that point: each encoding was relied upon for use in its own context, but with no particular expectation of compatibility with any other. Indeed, any two encodings chosen were often totally unworkable when used together, with text encoded in one interpreted as garbage characters by the other. Most encodings had only been designed to facilitate interoperation between
7569-489: Was suggested for the Indic scripts in 1997 by William Bright , following South Asian linguistic usage, to convey the idea that, "they share features of both alphabet and syllabary." The formal definitions given by Daniels and Bright for abugida and alphasyllabary differ; some writing systems are abugidas but not alphasyllabaries, and some are alphasyllabaries but not abugidas. An abugida is defined as "a type of writing system whose basic characters denote consonants followed by
#993006