Mojikyō - Misplaced Pages

Mojikyō ( Japanese : 文字鏡 ), also known by its full name Konjaku Mojikyō ( 今昔文字鏡 , lit. ' (the) past and present character mirror ' ) , is a character encoding scheme created to provide a complete index of characters used in the Chinese , Japanese , Korean , Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The Mojikyō Institute ( 文字鏡研究会 , Mojikyō Kenkyūkai ) , which published the character set, also published computer software and TrueType computer fonts to accompany it. The Mojikyō Institute, chaired by Tadahisa Ishikawa ( 石川忠久 ) , originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.

#791208

29-696: Conceptualized in 1996, the first version of the CD-ROM was released in July 1997. For a time, the Mojikyō Institute also offered a web subscription, termed " Mojikyō WEB" ( 文字鏡WEB ), which had more up-to-date characters. As of September 2006, Mojikyō encoded 174,975 characters. Among those, 150,366 characters (≈86%) then belonged to the extended Chinese–Japanese–Korean–Vietnamese (CJKV) family. Many of Mojikyō 's characters are considered obsolete or obscure, and are not encoded by any other character set, including

58-464: A ZIP file and are each around 2–5 megabytes ; the different fonts contain different numbers of characters. Also included is a Windows executable that implements a graphical character map , the " Mojikyō Character Map" ( 文字鏡MAP ), MOCHRMAP .EXE . MOCHRMAP.EXE allows users to browse through the Mojikyō fonts, and copy and paste characters in lieu of typing them on the keyboard. As opposed to

87-561: A J-Source equal to JK-66038. All Unicode characters with a JK-prefixed J-Source originate from Mojikyō . According to Ken Lunde , a subject matter expert in character encodings and East Asian languages , as of Unicode 13.0, 782 ideographs in Unicode originate from Mojikyō , split somewhat evenly between two blocks : CJK Unified Ideographs Extension C , with 367, and CJK Unified Ideographs Extension E , with 415. Not all Unicode characters with Mojikyō origins (JK-prefixed J-Sources) have

116-598: A minor role in an eventually successful series of proposals to encode the Tangut script in Unicode; Mojikyō already had within its encoding 6,000 Tangut characters by October 2002. The Unicode Standard's Unihan Database refers to Mojikyō as the "Japanese KOKUJI Collection" ( 日本国字集 ), abbreviated "JK". For example, U+2B679 𫙹 CJK UNIFIED IDEOGRAPH-2B679 , an ideograph read in Japanese as burizādo ( ブリザード , lit. ' blizzard ' ) , has

145-456: A non- kan-on reading in a word where the kan-on reading is well known is a common cause of reading mistakes or difficulty, such as in ge-doku ( 解毒 , detoxification, anti-poison) ( go-on ), where 解 is usually instead read as kai . The go-on readings are especially common in Buddhist terminology such as gokuraku ( 極楽 , paradise) , as well as in some of the earliest loans, such as

174-474: Is proprietary software under a restrictive license. Originally, the Mojikyō Institute tried to prevent its character data from being used, and threatened those who published conversion tables to and from its character set. In July 2010, the Mojikyō Institute abandoned its legal efforts to stop at least one Japanese user from publishing conversion tables or converting characters encoded in Mojikyō to Unicode or other character sets. Mere data, sometimes including

203-578: Is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however, South Korean students are taught 1,800 characters . Other scripts used for these languages, such as bopomofo and the Latin -based pinyin for Chinese, hiragana and katakana for Japanese, and hangul for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of

232-509: Is often used as a starting point for Unicode proposals. However, Mojikyō has much looser standards than Unicode for encoding, which leads Mojikyō to have many encoded glyphs of dubious, or even unintentionally fictional, origin. As such, while many non-Unicode Mojikyō characters are suitable for addition to Unicode, not all can become Unicode characters, due to the differing standards of evidence required by each. The Mojikyō fonts ( 文字鏡フォント ) are TrueType fonts that come in

261-535: Is reflected in the carryover to Japanese as well. Additionally, many Chinese syllables, especially those with an entering tone , did not fit the largely consonant-vowel (CV) phonotactics of classical Japanese. Thus most on'yomi are composed of two morae (beats), the second of which is either a lengthening of the vowel in the first mora (to ei , ō , or ū ), the vowel i , or one of the syllables ku , ki , tsu , chi , fu (historically, later merged into ō and ū ), or moraic n , chosen for their approximation to

290-511: Is that Mojikyō encodings displayed this way are decimal , while Unicode's U+ encoding is hexadecimal . From the earliest days of Unicode, Mojikyō has both influenced—and been influenced by—the standard. Glyphs originating from Mojikyō first appear in a proposal to the Ideographic Rapporteur Group (IRG), which is responsible for maintaining all CJK blocks in Unicode, on 18 April 2002. In May 2007, Mojikyō played

319-636: The English borrowings from Latin, Greek, and Norman French , since Chinese-borrowed terms are often more specialized, or considered to sound more erudite or formal, than their native counterparts (occupying a higher linguistic register ). The major exception to this rule is family names , in which the native kun'yomi are usually used (though on'yomi are found in many personal names, especially men's names). Kanji invented in Japan ( kokuji ) would not normally be expected to have on'yomi , but there are exceptions, such as

SECTION 10

#1732851627792

348-601: The Sino-Japanese reading, is the reading of a kanji based on the historical Chinese pronunciation of the character. A single kanji might have multiple on'yomi pronunciations, reflecting the Chinese pronunciations of different periods or regions. On'yomi pronunciations are generally classified into go-on , kan-on , tō-on and kan'yō-on , roughly based on when they were borrowed from China. Generally, on'yomi pronunciations are used for technical, compound words, while

377-730: The Chinese-origin logographic script formerly used for the Vietnamese language , or CJKVZ to also include Sawndip , used to write the Zhuang languages . Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general literacy , with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea

406-480: The Latin-based Vietnamese alphabet . The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit character encodings , requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from Unicode up to and including version 2.0, are now deprecated due to

435-592: The character 働 "to work", which has the kun'yomi " hatara(ku) " and the on'yomi " dō ", and 腺 "gland", which has only the on'yomi " sen "—in both cases these come from the on'yomi of the phonetic component, respectively 動 " dō " and 泉 " sen ". In Chinese, most characters are associated with a single Chinese sound, though there are distinct literary and colloquial readings . However, some homographs ( 多音字 ) such as 行 (Mandarin: háng or xíng , Japanese: an, gō, gyō ) have more than one reading in Chinese representing different meanings, which

464-422: The character sets in a process known as Han unification . CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as pinyin , bopomofo , hiragana, katakana and hangul. CJK character encodings include: The CJK character sets take up the bulk of the assigned Unicode code space. There is much controversy among Japanese experts of Chinese characters about

493-500: The desirability and technical merit of the Han unification process used to map multiple Chinese and Japanese character sets into a single set of unified characters. All three languages can be written both left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues. Libraries cooperated on encoding standards for JACKPHY characters in

522-2256: The early 1980s. According to Ken Lunde , the abbreviation "CJK" was a registered trademark of Research Libraries Group (which merged with OCLC in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired. CJK Unified Ideographs CJK Unified Ideographs Extension A CJK Unified Ideographs Extension B CJK Unified Ideographs Extension C CJK Unified Ideographs Extension D CJK Unified Ideographs Extension E CJK Unified Ideographs Extension F CJK Unified Ideographs Extension G CJK Unified Ideographs Extension H CJK Unified Ideographs Extension I CJK Radicals Supplement Kangxi Radicals Ideographic Description Characters CJK Symbols and Punctuation CJK Strokes Enclosed CJK Letters and Months CJK Compatibility CJK Compatibility Ideographs CJK Compatibility Forms Enclosed Ideographic Supplement CJK Compatibility Ideographs Supplement 0 BMP 0 BMP 2 SIP 2 SIP 2 SIP 2 SIP 2 SIP 3 TIP 3 TIP 2 SIP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 0 BMP 1 SMP 2 SIP 4E00–9FFF 3400–4DBF 20000–2A6DF 2A700–2B73F 2B740–2B81F 2B820–2CEAF 2CEB0–2EBEF 30000–3134F 31350–323AF 2EBF0–2EE5F 2E80–2EFF 2F00–2FDF 2FF0–2FFF 3000–303F 31C0–31EF 3200–32FF 3300–33FF F900–FAFF FE30–FE4F 1F200–1F2FF 2F800–2FA1F 20,992 6,592 42,720 4,154 222 5,762 7,473 4,939 4,192 622 115 214 16 64 39 255 256 472 32 64 542 Unified Unified Unified Unified Unified Unified Unified Unified Unified Unified Not unified Not unified Not unified Not unified Not unified Not unified Not unified 12 are unified Not unified Not unified Not unified Han Han Han Han Han Han Han Han Han Han Han Han Common Han, Hangul , Common, Inherited Common Hangul, Katakana , Common Katakana, Common Han Common Hiragana , Common Han On%27yomi On'yomi ( 音読み , [oɰ̃jomi] , lit. "sound(-based) reading") , or

551-684: The encoding is made, nor is there an attempt to keep all common characters below U+FFFF as there is in Unicode. Unicode, on the other hand, sorts its CJK into blocks based on how common they are: the most common are generally put into the Basic Multilingual Plane , while those that are rare or obscure are put into the Supplementary Planes . For example, Radical 9 has two characters where Unicode has one: MJ054435 ( 令 ), and MJ059031 ( 令 ), both represented in Unicode as U+4EE4 令 CJK UNIFIED IDEOGRAPH-4EE4 . Mojikyō

580-426: The final consonants of Middle Chinese. It may be that palatalized consonants before vowels other than i developed in Japanese as a result of Chinese borrowings, as they are virtually unknown in words of native Japanese origin, but are common in Chinese. Generally, on'yomi are classified into four types according to their region and time of origin: The most common form of readings is the kan-on one, and use of

609-504: The final release of Mojikyō . The Mojikyō encoding was created to provide a complete index of characters used in the Chinese , Japanese , Korean writing systems and Vietnamese Chữ Nôm logographic scripts. It also encodes a large number of characters in ancient scripts, such as the oracle bone script , the seal script , and Sanskrit ( Siddhaṃ ). For many characters, it is the only character encoding to encode them, and its data

SECTION 20

#1732851627792

638-483: The international standard, Unicode. Each Mojikyō character has a unique number, and the characters are organized into blocks. Mojikyō puts CJKV characters in different blocks according to their traditional Kangxi radical . Common radicals containing an especially high number of characters, such as Radicals 9 ( 人 ) and 162 ( ⻌ ), are split further by stroke order. Unlike Unicode, Mojikyō purposely avoids Han unification ; no attempt at compactness of

667-433: The most widely used international text encoding standard, Unicode . Originally a paid proprietary software product, as of 2015, the Mojikyō Institute began to upload its latest releases to Internet Archive as freeware , as a memorial to honor one of its developers, Tokio Furuya ( 古家時雄 ) , who died that year. On 15 December 2018, version 4.0 was released. The next day, Ishikawa announced that without Furuya this would be

696-407: The native kun'yomi pronunciation is used for singular, simpler words. On'yomi primarily occur in multi-kanji compound words ( 熟語 , jukugo ) , many of which are the result of the adoption, along with the kanji themselves, of Chinese words for concepts that either did not exist in Japanese or could not be articulated as elegantly using native words. This borrowing process is often compared to

725-638: The regular Windows character map, or for that matter KCharSelect , which both support TrueType fonts, MOCHRMAP.EXE displays the numbered Mojikyō encoding slot of the requested character. In order for MOCHRMAP.EXE to work, all Mojikyō fonts must be installed. When referring to a character encoded in Mojikyō , the format MJXXXXXX is often used, similar to the U+XXXX format used for Unicode. For example, hentaigana U+1B008 𛀈 HENTAIGANA LETTER I-3 has Mojikyō encoding MJ090007 and Unicode encoding U+1B008. A difference, however,

754-597: The requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the GB 18030 character set. Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible. Unicode has attempted, with some controversy, to unify

783-496: The same representative glyph in the code chart as in the Mojikyō font; some characters had their shapes changed before final encoding, as investigation showed the shapes assigned by the Mojikyō Institute were wrong. As of September 2006 it encoded 174,975 characters. Among those, 150,366 characters then belonged to the extended CJKV family. Many of the encoded characters are considered obsolete or otherwise obscure, and are not encoded by any other character set, including

812-492: The shapes of letters, are considered in many jurisdictions to be common property as they do not meet the threshold of originality . Due to this legacy, however, GlyphWiki [ ja ] disallowed Mojikyō data as of 2020. CJKV In internationalization , CJK characters is a collective term for graphemes used in the Chinese , Japanese , and Korean writing systems , which each include Chinese characters . It can also go by CJKV to include Chữ Nôm ,

841-503: The target languages. The sinologist Carl Leban (1971) produced an early survey of CJK encoding systems. Until the early 20th century, Classical Chinese was the written language of government and scholarship in Vietnam. Popular literature in Vietnamese was written in the chữ Nôm script, consisting of Chinese characters with many characters created locally. Since the 1920s, the script since then used for recording literature has been

#791208