Diacritic

A diacritic – also diacritical mark, diacritical point, diacritical sign, or accent – is a glyph added to a letter or basic glyph. The term derives from the Ancient Greek διακριτικός (diakritikós, "distinguishing"), from διακρίνω (diakrī́nō, "to distinguish"). Diacritic is primarily an adjective, though sometimes used as a noun, whereas diacritical is only ever an adjective. Some diacritical marks, such as the acute ( ´ ) and grave ( ` ), are often called accents. Diacritical marks may appear above or below a letter, or in some other position such as within the letter or between two letters.

The main use of diacritical marks in the Latin script is to change the sound-values of the letters to which they are added. Examples are the diaereses in the borrowed French words naïve and Noël, which show that the vowel with the diaeresis mark is pronounced separately from the preceding vowel; the acute and grave accents, which can indicate that a final vowel is to be pronounced, as in saké and poetic breathèd; and the cedilla under the "c" in the borrowed French word façade, which shows it is pronounced /s/ rather than /k/. In other Latin-script alphabets, they may distinguish between homonyms, such as the French ("there") versus la ("the") that are both pronounced /la/. In Gaelic type, a dot over a consonant indicates lenition of the consonant in question.

In other alphabetic systems, diacritical marks may perform other functions. Vowel pointing systems, namely the Arabic harakat ( ـِ ,ـُ ,ـَ, etc.) and the Hebrew niqqud ( ַ◌, ֶ◌, ִ◌, ֹ◌, ֻ◌, etc.) systems, indicate vowels that are not conveyed by the basic alphabet. The Indic virama ( ् etc.) and the Arabic sukūn ( ـْـ‎ ) mark the absence of vowels. Cantillation marks indicate prosody. Other uses include the Early Cyrillic titlo stroke ( ◌҃ ) and the Hebrew gershayim ( ״‎ ), which, respectively, mark abbreviations or acronyms, and Greek diacritical marks, which showed that letters of the alphabet were being used as numerals. In the Hanyu Pinyin official romanization system for Chinese, diacritics are used to mark the tones of the syllables in which the marked vowels occur.

In orthography and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language, and may vary from case to case within a language. English is the only major modern European language requiring no diacritics for native words (although a diaeresis may be used in words such as "coöperation").[1][2]

In some cases, letters are used as "in-line diacritics", with the same function as ancillary glyphs, in that they modify the sound of the letter preceding them, as in the case of the "h" in the English pronunciation of "sh" and "th".[3]

Among the types of diacritic used in alphabets based on the Latin script are:

The tilde, dot, comma, titlo, apostrophe, bar, and colon are sometimes diacritical marks, but also have other uses.

Not all diacritics occur adjacent to the letter they modify. In the Wali language of Ghana, for example, an apostrophe indicates a change of vowel quality, but occurs at the beginning of the word, as in the dialects ’Bulengee and ’Dolimi. Because of vowel harmony, all vowels in a word are affected, so the scope of the diacritic is the entire word. In abugida scripts, like those used to write Hindi and Thai, diacritics indicate vowels, and may occur above, below, before, after, or around the consonant letter they modify.

The tittle (dot) on the letter i or the letter j, of the Latin alphabet originated as a diacritic to clearly distinguish i from the minims (downstrokes) of adjacent letters. It first appeared in the 11th century in the sequence ii (as in ingeníí), then spread to i adjacent to m, n, u, and finally to all lowercase is. The j, originally a variant of i, inherited the tittle. The shape of the diacritic developed from initially resembling today's acute accent to a long flourish by the 15th century. With the advent of Roman type it was reduced to the round dot we have today.[4]

Languages from Eastern Europe tend to use diacritics on both consonants and vowels, whereas in Western Europe digraphs are more typically used to change consonant sounds. Most languages in Western Europe use diacritics on vowels, aside from English where they are typically none (with some exceptions).

These diacritics are used in addition to the acute, grave, and circumflex accents and the diaeresis:

The diacritics >〮 and  , known as Bangjeom (방점;傍點), were used to mark pitch accents in Hangul for Middle Korean. They were written to the left of a syllable in vertical writing and above a syllable in horizontal writing.

The South Korean government officially revised the romanization of the Korean language in July 2000 to eliminate diacritics.

Devanagari script's (from Brahmic family) compound letters, which are vowels combined with consonants, have diacritics. Here क is shown with vowel diacritics.

In addition to the above vowel marks, transliteration of Syriac sometimes includes ə, or superscript e (or often nothing at all) to represent an original Aramaic schwa that became lost later on at some point in the development of Syriac.[5] Some transliteration schemes find its inclusion necessary for showing spirantization or for historical reasons.[6][7]

Some non-alphabetic scripts also employ symbols that function essentially as diacritics.

Different languages use different rules to put diacritic characters in alphabetical order. French treats letters with diacritical marks the same as the underlying letter for purposes of ordering and dictionaries.

The Scandinavian languages, by contrast, treat the characters with diacritics ä, ö and å as new and separate letters of the alphabet, and sort them after z. Usually ä is sorted as equal to æ (ash) and ö is sorted as equal to ø (o-slash). Also, aa, when used as an alternative spelling to å, is sorted as such. Other letters modified by diacritics are treated as variants of the underlying letter, with the exception that ü is frequently sorted as y.

Languages that treat accented letters as variants of the underlying letter usually alphabetize words with such symbols immediately after similar unmarked words. For instance, in German where two words differ only by an umlaut, the word without it is sorted first in German dictionaries (e.g. schon and then schön, or fallen and then fällen). However, when names are concerned (e.g. in phone books or in author catalogues in libraries), umlauts are often treated as combinations of the vowel with a suffixed e; Austrian phone books now treat characters with umlauts as separate letters (immediately following the underlying vowel).

In Spanish, the grapheme ñ is considered a new letter different from n and collated between n and o, as it denotes a different sound from that of a plain n. But the accented vowels á, é, í, ó, ú are not separated from the unaccented vowels a, e, i, o, u, as the acute accent in Spanish only modifies stress within the word or denotes a distinction between homonyms, and does not modify the sound of a letter.

For a comprehensive list of the collating orders in various languages, see Collating sequence.

Modern computer technology was developed mostly in English-speaking countries, so data formats, keyboard layouts, etc. were developed with a bias favoring English, a language with an alphabet without diacritical marks. This has led some to theorize that the marks and accents may be made obsolete to facilitate the worldwide exchange of data.[citation needed] Efforts have been made to create internationalized domain names that further extend the English alphabet (e.g., "pokémon.com").

Depending on the keyboard layout, which differs amongst countries, it is more or less easy to enter letters with diacritics on computers and typewriters. Some have their own keys; some are created by first pressing the key with the diacritic mark followed by the letter to place it on. Such a key is sometimes referred to as a dead key, as it produces no output of its own but modifies the output of the key pressed after it.

In modern Microsoft Windows and Linux operating systems, the keyboard layouts US International and UK International feature dead keys that allow one to type Latin letters with the acute, grave, circumflex, diæresis, tilde, and cedilla found in Western European languages (specifically, those combinations found in the ISO Latin-1 character set) directly: ¨ + e gives ë, ~ + o gives õ, etc. On Apple Macintosh computers, there are keyboard shortcuts for the most common diacritics; Option-e followed by a vowel places an acute accent, Option-u followed by a vowel gives an umlaut, Option-c gives a cedilla, etc. Diacritics can be composed in most X Window System keyboard layouts, as well as other operating systems, such as Microsoft Windows, using additional software.

On computers, the availability of code pages determines whether one can use certain diacritics. Unicode solves this problem by assigning every known character its own code; if this code is known, most modern computer systems provide a method to input it. With Unicode, it is also possible to combine diacritical marks with most characters.

The following languages have letters that contain diacritics that are considered independent letters distinct from those without diacritics.

English is one of the few European languages that does not have many words that contain diacritical marks. Instead, digraphs are the main way the Modern English alphabet adapts the Latin to its phonemes. Exceptions are unassimilated foreign loanwords, including borrowings from French and, increasingly, Spanish; however, the diacritic is also sometimes omitted from such words. Loanwords that frequently appear with the diacritic in English include café, résumé or resumé (a usage that helps distinguish it from the verb resume), soufflé, and naïveté (see English terms with diacritical marks). In older practice (and even among some orthographically conservative modern writers) one may see examples such as élite, mêlée and rôle.

English speakers and writers once used the diaeresis more often than now in words such as coöperation (from Fr. coopération), zoölogy (from Grk. zoologia), and seeër (now more commonly see-er or simply seer) as a way of indicating that adjacent vowels belonged to separate syllables, but this practice has become far less common. The New Yorker magazine is a major publication that continues to use the diaresis in place of a dash for clarity and economy of space.[11]

A few English words, out of context, can only be distinguished from others by a diacritic or modified letter, including exposé, lamé, maté, öre, øre, pâté, and rosé'. The same is true of résumé, alternately resumé, but nevertheless it is regularly spelled resume. In a few words, diacritics that did not exist in the original have been added for disambiguation, as in maté (from Sp. and Port. mate), saké (the standard Romanization of the Japanese has no accent mark), and Malé (from Dhivehi މާލެ), to clearly distinguish them from the English words "mate", "sake", and "male".

The acute and grave accents are occasionally used in poetry and lyrics: the acute to indicate stress overtly where it might be ambiguous (rébel vs. rebél) or nonstandard for metrical reasons (caléndar), the grave to indicate that an ordinarily silent or elided syllable is pronounced (warnèd, parlìament).

In certain personal names such as Renée and Zoë, often two spellings exist, and the preference will be known only to those close to the person themselves. Even when the name of a person is spelled with a diacritic, like Charlotte Brontë, this may be dropped in English language articles and even official documents such as passports either due to carelessness, the typist not knowing how to enter letters with diacritical marks, or for technical reasons - California, for example, does not allow names with diacritics as the computer system cannot process such characters. They also appear in some worldwide company names and/or trademarks such as Nestlé or Citroën.

The following languages have letter-diacritic combinations that are not considered independent letters.

Several languages that are not written with the Roman alphabet are transliterated, or romanized, using diacritics. Examples:

Possibly the greatest number of combining diacritics required to compose a valid character in any Unicode language is 8, for the "well-known grapheme cluster in Tibetan and Ranjana scripts",ཧྐྵྨླྺྼྻྂ, or HAKṢHMALAWARAYAṀ.[citation needed]


It is U+0F67 U+0F90 U+0FB5 U+0FA8 U+0FB3 U+0FBA U+0FBC U+0FBB U+0F82, or:
TIBETAN LETTER HA + TIBETAN SUBJOINED LETTER KA + TIBETAN SUBJOINED LETTER SSA + TIBETAN SUBJOINED LETTER MA + TIBETAN SUBJOINED LETTER LA + TIBETAN SUBJOINED LETTER FIXED-FORM WA + TIBETAN SUBJOINED LETTER FIXED-FORM RA + TIBETAN SUBJOINED LETTER FIXED-FORM YA + TIBETAN SIGN NYI ZLA NAA DA.

Some users have explored the limits of rendering in web browsers and other software by "decorating" words with multiple nonsensical diacritics per character. The result is called "Zalgo text". The composed bogus characters and words can be copied and pasted normally via the system clipboard.




Example: c̳̻͚̻̩̻͉̯̄̏͑̋͆̎͐ͬ͑͌́͢h̵͔͈͍͇̪̯͇̞͖͇̜͉̪̪̤̙ͧͣ̓̐̓ͤ͋͒ͥ͑̆͒̓͋̑́͞ǎ̡̮̤̤̬͚̝͙̞͎̇ͧ͆͊ͅo̴̲̺͓̖͖͉̜̟̗̮̳͉̻͉̫̯̫̍̋̿̒͌̃̂͊̏̈̏̿ͧ́ͬ̌ͥ̇̓̀͢͜s̵̵̘̹̜̝̘̺̙̻̠̱͚̤͓͚̠͙̝͕͆̿̽ͥ̃͠͡