![]() ![]() ![]() Standard: follow established systems (standards, authorities, or de facto practice) where possible, deviating sometimes where necessary for reversibility. The following lists the general guidelines for Unicode CLDR transliterations: These requirements are most important for people who are building transliterations, but are also useful as background information for users. These guidelines are rarely satisfied simultaneously, so constructing a reasonable transliteration is always a process of balancing different requirements. There are a number of generally desirable guidelines for script transliterations. This is similar to the Lookup Fallback Pattern used in BCP 47 Tags for Identifying Languages, except that it uses a "stepladder approach" to progressively handle the fallback among source, target, and variant, with priorities being the target, source, and variant, in that order. For example, the following would be the fallback chain for the identifier Russian-English/UNGEGN. The assumption is that implementations will allow the use of fallbacks, if the exact transliteration specified is unavailable. If there were multiple versions of these over time, the variant would be, say, UNGEGN2006. ![]() For example, the identifier for the Russian to Latin transliteration according to the UNGEGN system would be: The variant should specify the authority for the system, and if necessary for disambiguation, the year. The source (and target) can be a language or script, either using the English name or a locale code. The canonical identifier that CLDR uses for these has the form: Įven within particular languages, there can be variant systems according to different authorities, or even varying across time (if the authority for a system changes its recommendation). Transliterations from other scripts to Latin are also called Romanizations. These can be referred to, respectively, as script transliterations, language-specific transliterations, or script-language transliterations. There can also be semi-generic mappings, such as Russian-Latin or Cyrillic-French. ĬLDR provides for generic mappings from script to script (such as Cyrillic-Latin), and also language-specific variants (Russian-French, or Serbian-German). For example, for the Greek example above, the transliteration is classical, while the UNGEGN alternate has different correspondences, such as φ → f instead of φ → ph. There are many systems for transliteration between languages: the same text can be transliterated in many different ways. There is an online demo using released CLDR data at ICU Transform Demo. Transliteration can also be used to convert unfamiliar letters within the same script, such as converting Icelandic THORN (þ) to th. Thus from Latin we don't have reversibility, because two different Latin source strings round-trip back to the same Latin string. ![]() This means that more than one Latin character may map to the same Hangul. However, for completeness, many Latin characters have fallbacks. For example, Hangul is reversible, in that any Hangul to Latin to Hangul should provide the same Hangul as the input. Note that reversibility is generally only in one direction, so a transliteration from a native script to Latin may be reversible, but not the other way around. (Note that even if theoretically a transliteration system is supposed to be reversible, in source standards it is often not specified in sufficient detail in the edge cases to actually be reversible.) A non-reversible transliteration is often called a transcription, or called a lossy or ambiguous transcription. In CLDR this is not the case the term transliteration is interpreted broadly to mean both reversible and non-reversible transforms of text. The term transliteration is sometimes given a narrow meaning, implying that the transformation is reversible (sometimes called lossless ). When a service engineer is sent a program dump that is filled with characters from foreign scripts, it is much easier to diagnose the problem when the text is transliterated and the service engineer can recognize the characters. When the user performs searching and indexing tasks, transliteration can retrieve information in a different script. When a user views names that are entered in a world-wide database, it is extremely helpful to view and refer to the names in the user's native script. There are several situations where this transliteration is especially useful, such as the following. While an English speaker may not recognize that the Japanese word kyanpasu is equivalent to the English word campus, the word kyanpasu is still far easier to recognize and interpret than if the letters were left in the original script. Some of the characters in this document may not be visible in your browser, and with some fonts the diacritics will not be correctly placed on the base letters. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |