Misplaced Pages

Unicode collation algorithm

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

The Unicode collation algorithm ( UCA ) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode . These keys can then be efficiently compared byte by byte in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.

#389610

3-661: Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages, and some such customizations can be found in the Unicode Common Locale Data Repository (CLDR). An open source implementation of UCA is included with the International Components for Unicode , ICU. ICU supports tailoring, and

6-548: Is written in the Locale Data Markup Language ( LDML ). Among the types of data that CLDR includes are the following: The information is currently used in International Components for Unicode , Apple 's macOS , LibreOffice , MediaWiki , and IBM 's AIX , among other applications and operating systems. CLDR overlaps somewhat with ISO/IEC 15897 ( POSIX locales). POSIX locale information can be derived from CLDR by using some of CLDR's conversion tools. CLDR

9-750: The collation tailorings from CLDR are included in ICU. This algorithms or data structures -related article is a stub . You can help Misplaced Pages by expanding it . This standards - or measurement -related article is a stub . You can help Misplaced Pages by expanding it . Common Locale Data Repository The Common Locale Data Repository ( CLDR ) is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to applications. CLDR

#389610