The Semitic languages, previously also named Syro-Arabian languages, are a branch of the Afroasiatic language family originating in the Middle East that are spoken by more than 330 million people across much of Western Asia, North Africa and the Horn of Africa, as well as in often large immigrant and expatriate communities in North America, Europe and Australia. The terminology was first used in the 1780s by members of the Göttingen School of History, who derived the name from Shem, one of the three sons of Noah in the Book of Genesis.
The most widely spoken Semitic languages today are (numbers given are for native speakers only) Arabic (300 million), Amharic (22 million), Tigrinya (7 million), Hebrew (~5 million native/L1 speakers), Tigre (~1.05 million), Aramaic (575,000 to 1 million largely Assyrian fluent speakers) and Maltese (483,000 speakers).
Semitic languages occur in written form from a very early historical date, with East Semitic Akkadian and Eblaite texts (written in a script adapted from Sumerian cuneiform) appearing from the 30th century BCE and the 25th century BCE in Mesopotamia and the northern Levant respectively. The only earlier attested languages are Sumerian, Elamite (2800 BCE to 550 BCE) (both language isolates), Egyptian and unclassified Lullubi from the 30th century BCE.
Most scripts used to write Semitic languages are abjads – a type of alphabetic script that omits some or all of the vowels, which is feasible for these languages because the consonants in the Semitic languages are the primary carriers of meaning. Among them are the Ugaritic, Phoenician, Aramaic, Hebrew, Syriac, Arabic, and South Arabian alphabets. The Ge'ez script, used for writing the Semitic languages of Ethiopia and Eritrea, is technically an abugida – a modified abjad in which vowels are notated using diacritic marks added to the consonants at all times, in contrast with other Semitic languages which indicate diacritics based on need or for introductory purposes. Maltese is the only Semitic language written in the Latin script and the only Semitic language to be an official language of the European Union.
The Semitic languages are notable for their nonconcatenative morphology. That is, word roots are not themselves syllables or words, but instead are isolated sets of consonants (usually three, making a so-called triliteral root). Words are composed out of roots not so much by adding prefixes or suffixes, but rather by filling in the vowels between the root consonants (although prefixes and suffixes are often added as well). For example, in Arabic, the root meaning "write" has the form k-t-b. From this root, words are formed by filling in the vowels and sometimes adding additional consonants, e.g. كتاب kitāb "book", كتب kutub "books", كاتب kātib "writer", كتّاب kuttāb "writers", كتب kataba "he wrote", يكتب yaktubu "he writes", etc.
The similarity of the Hebrew, Arabic and Aramaic languages was accepted by Jewish and Islamic scholars since medieval times. The languages were familiar to Western European scholars due to historical contact with neighbouring Near Eastern countries and through Biblical studies, and a comparative analysis of Hebrew, Arabic, and Aramaic was published in Latin in 1538 by Guillaume Postel. Almost two centuries later, Hiob Ludolf described the similarities between these three languages and the Ethiopian Semitic languages. However, neither scholar named this grouping as "Semitic".
The term "Semitic" was created by members of the Göttingen School of History, and specifically by August Ludwig von Schlözer (1781) and Johann Gottfried Eichhorn (1787) first coined the name "Semitic" in the late 18th century to designate the languages closely related to Arabic, Aramaic, and Hebrew. The choice of name was derived from Shem, one of the three sons of Noah in the genealogical accounts of the biblical Book of Genesis, or more precisely from the Koine Greek rendering of the name, Σήμ (Sēm). Eichhorn is credited with popularising the term, particularly via a 1795 article "Semitische Sprachen" (Semitic languages) in which he justified the terminology against criticism that Hebrew and Canaanite were the same language despite Canaan being "Hamitic" in the Table of Nations:
In the Mosaic Table of Nations, those names which are listed as Semites are purely names of tribes who speak the so-called Oriental languages and live in Southwest Asia. As far as we can trace the history of these very languages back in time, they have always been written with syllabograms or with alphabetic script (never with hieroglyphs or pictograms); and the legends about the invention of the syllabograms and alphabetic script go back to the Semites. In contrast, all so called Hamitic peoples originally used hieroglyphs, until they here and there, either through contact with the Semites, or through their settlement among them, became familiar with their syllabograms or alphabetic script, and partly adopted them. Viewed from this aspect too, with respect to the alphabet used, the name "Semitic languages" is completely appropriate.
Previously these languages had been commonly known as the "Oriental languages" in European literature. In the 19th century, "Semitic" became the conventional name; however, an alternative name, "Syro-Arabian languages", was later introduced by James Cowles Prichard and used by some writers.
There are several locations proposed as possible sites for prehistoric origins of Semitic-speaking peoples: Mesopotamia, the Levant, Mediterranean, the Arabian Peninsula, and North Africa, with the most recent Bayesian studies supporting the view that Semitic originated in the Levant circa 3800 BC, and was later also introduced to the Horn of Africa in approximately 800 BC.
Semitic languages were spoken across much of the Middle East and Asia Minor during the Bronze Age and Iron Age, the earliest attested being the East Semitic Akkadian of the Mesopotamian and south eastern Anatolian polities of Akkad, Assyria and Babylonia, and the also East Semitic Eblaite language of the kingdom of Ebla in the north eastern Levant. The various closely related Northwest Semitic Canaanite languages included Amorite, Edomite, Hebrew, Ammonite, Moabite, Phoenician (Punic/Carthaginian), Samaritan, Ekronite and Sutean, encompassed what is today Israel, western, north western and southern Syria, Lebanon, Palestinian territories, Jordan, the Sinai peninsula, northern parts of the Arabian peninsula and in the case of Phoenician, coastal regions of Tunisia (Carthage), Libya and Algeria, as well as possibly Malta also. Ugaritic was spoken in the kingdom of Ugarit in north western Syria. Old South Arabian languages (distinct from the later attested Arabic) were spoken in the kingdoms of Dilmun, Meluhha, Sheba, Ubar and Magan, which in modern terms encompassed part of the eastern coast of Saudi Arabia, and Bahrain, Qatar, Oman and Yemen. These languages (in the form of Ge'ez) later spread to the Horn of Africa circa 8th century BC. Arabic and the Arabs were attested in Assyrian annals as being extant in the northern Arabian peninsula from the 9th century BC. Aramaic, a Northwest Semitic language first attested in the 12th century BC in the Levant gradually replaced the East Semitic and Canaanite languages across much of the Near East, particularly after being adopted as the lingua franca of the vast Neo-Assyrian Empire (911-605 BC) by Tiglath-Pileser III during the 8th century BC, and being retained by the succeeding Neo-Babylonian Empire and Achaemenid Empires.
Syriac, a 5th-century BC Assyrian Mesopotamian descendant of Aramaic used in northeastern Syria, Mesopotamia and south east Anatolia, rose to importance as a literary language of early Christianity in the third to fifth centuries and continued into the early Islamic era.
With the advent of the early Muslim conquests of the seventh and eighth centuries, the hitherto largely uninfluential Arabic language slowly replaced many (but not all) of the indigenous Semitic languages and cultures of the Near East. Both the Near East and North Africa saw an influx of Muslim Arabs from the Arabian Peninsula, followed later by non-Semitic Muslim Iranian and Turkic peoples. The previously dominant Aramaic dialects gradually began to be sidelined, however descendant dialects of Eastern Aramaic (including the Akkadian influenced Assyrian Neo-Aramaic, Chaldean Neo-Aramaic, Turoyo and Mandaic) survive to this day among the Assyrians and Mandaeans of northern Iraq, northwestern Iran, northeastern Syria and southeastern Turkey, with up to a million fluent speakers. Western Aramaic is now only spoken by a few thousand Syriac Christians in western Syria. The Arabs spread their Central Semitic language to North Africa (Egypt, Libya, Tunisia, Algeria, Morocco and northern Sudan and Mauritania) where it gradually replaced Egyptian Coptic and many Berber languages (although Berber is still largely extant in many areas), and for a time to the Iberian Peninsula (modern Spain, Portugal and Gibraltar) and Malta.
With the patronage of the caliphs and the prestige of its liturgical status, Arabic rapidly became one of the world's main literary languages. Its spread among the masses took much longer, however, as many (although not all) of the native populations outside the Arabian Peninsula only gradually abandoned their languages in favour of Arabic. As Bedouin tribes settled in conquered areas, it became the main language of not only central Arabia, but also Yemen, the Fertile Crescent, and Egypt. Most of the Maghreb followed, particularly in the wake of the Banu Hilal's incursion in the 11th century, and Arabic became the native language of many inhabitants of al-Andalus. After the collapse of the Nubian kingdom of Dongola in the 14th century, Arabic began to spread south of Egypt into modern Sudan; soon after, the Beni Ḥassān brought Arabization to Mauritania. A number of Modern South Arabian languages distinct from Arabic still survive, such as Soqotri, Mehri and Shehri which are mainly spoken in Socotra, Yemen and Oman.
Meanwhile, the Semitic languages that had arrived from southern Arabia in the 8th century BC were diversifying in Ethiopia and Eritrea, where, under heavy Cushitic influence, they split into a number of languages, including Amharic and Tigrinya. With the expansion of Ethiopia under the Solomonic dynasty, Amharic, previously a minor local language, spread throughout much of the country, replacing both Semitic (such as Gafat) and non-Semitic (such as Weyto) languages, and replacing Ge'ez as the principal literary language (though Ge'ez remains the liturgical language for Christians in the region); this spread continues to this day, with Qimant set to disappear in another generation.
Arabic languages and dialects are currently the native languages of majorities from Mauritania to Oman, and from Iraq to the Sudan. Classical Arabic is the language of the Quran. It is also studied widely in the non-Arabic-speaking Muslim world. The Maltese language is genetically a descendant of the extinct Siculo-Arabic, a variety of Maghrebi Arabic formerly spoken in Sicily. The modern Maltese alphabet is based on the Latin script with the addition of some letters with diacritic marks and digraphs. Maltese is the only Semitic official language within the European Union.
Successful as second languages far beyond their numbers of contemporary first-language speakers, a few Semitic languages today are the base of the sacred literature of some of the world's major religions, including Islam (Arabic), Judaism (Hebrew and Aramaic), churches of Syriac Christianity (Syriac) and Ethiopian Christianity (Ge'ez). Millions learn these as a second language (or an archaic version of their modern tongues): many Muslims learn to read and recite the Qur'an and Jews speak and study Biblical Hebrew, the language of the Torah, Midrash, and other Jewish scriptures. Ethnic Assyrian followers of the Assyrian Church of the East, Chaldean Catholic Church, Ancient Church of the East, Assyrian Pentecostal Church, Assyrian Evangelical Church and Assyrian members of the Syriac Orthodox Church both speak Mesopotamian eastern Aramaic and use it also as a liturgical tongue. The language is also used liturgically by the primarily Arabic-speaking followers of the Maronite, Syriac Catholic Church and some Melkite Christians. Arabic itself is the main liturgical language of Oriental Orthodox Christians in the Middle East, who compose the patriarchates of Antioch, Jerusalem and Alexandria. Mandaic is both spoken and used as a liturgical language by the Mandaeans.
Despite the ascendancy of Arabic in the Middle East, other Semitic languages still exist. Biblical Hebrew, long extinct as a colloquial language and in use only in Jewish literary, intellectual, and liturgical activity, was revived in spoken form at the end of the 19th century. Modern Hebrew is the main language of Israel, with Biblical Hebrew remaining as the language of liturgy and religious scholarship of Jews worldwide.
Several smaller ethnic groups, in particular the Assyrians, Kurdish Jews, and Gnostic Mandeans, continue to speak and write Mesopotamian Aramaic languages, particularly Neo-Aramaic languages descended from Syriac, in those areas roughly corresponding to Kurdistan (northern Iraq, northeast Syria, south eastern Turkey and northwestern Iran) and the Caucasus. Syriac language itself, a descendant of Eastern Aramaic languages (Mesopotamian Old Aramaic), is used also liturgically by the Syriac Christians throughout the area. Although the majority of Neo-Aramaic dialects spoken today are descended from Eastern varieties, Western Neo-Aramaic is still spoken in 3 villages in Syria.
In Arab-dominated Yemen and Oman, on the southern rim of the Arabian Peninsula, a few tribes continue to speak Modern South Arabian languages such as Mahri and Soqotri. These languages differ greatly from both the surrounding Arabic dialects and from the (unrelated but previously thought to be related) languages of the Old South Arabian inscriptions.
Historically linked to the peninsular homeland of Old South Arabian, of which only one language, Razihi, remains, Ethiopia and Eritrea contain a substantial number of Semitic languages; the most widely spoken are Amharic in Ethiopia, Tigre in Eritrea, and Tigrinya in both. Amharic is the official language of Ethiopia. Tigrinya is a working language in Eritrea. Tigre is spoken by over one million people in the northern and central Eritrean lowlands and parts of eastern Sudan. A number of Gurage languages are spoken by populations in the semi-mountainous region of southwest Ethiopia, while Harari is restricted to the city of Harar. Ge'ez remains the liturgical language for certain groups of Christians in Ethiopia and in Eritrea.
The phonologies of the attested Semitic languages are presented here from a comparative point of view. See Proto-Semitic language#Phonology for details on the phonological reconstruction of Proto-Semitic used in this article. The reconstruction of Proto-Semitic (PS) was originally based primarily on Arabic, whose phonology and morphology (particularly in Classical Arabic) is very conservative, and which preserves as contrastive 28 out of the evident 29 consonantal phoneme. with *s [s] and *š [ʃ] merging into Arabic /s/ ⟨س⟩ and *ś [ɬ] becoming Arabic /ʃ/ ⟨ش⟩.
Note: the fricatives *s, *z, *ṣ, *ś, *ṣ́, *ṱ may also be interpreted as affricates (/t͡s/, /d͡z/, /t͡sʼ/, /t͡ɬ/, /t͡ɬʼ/, /t͡θʼ/), as discussed in Proto-Semitic language § Fricatives.
This comparative approach is natural for the consonants, as sound correspondences among the consonants of the Semitic languages are very straightforward for a family of its time depth. Sound shifts affecting the vowels are more numerous and, at times, less regular.
Each Proto-Semitic phoneme was reconstructed to explain a certain regular sound correspondence between various Semitic languages. Note that Latin letter values (italicized) for extinct languages are a question of transcription; the exact pronunciation is not recorded.
Most of the attested languages have merged a number of the reconstructed original fricatives, though South Arabian retains all fourteen (and has added a fifteenth from *p > f).
In Aramaic and Hebrew, all non-emphatic stops occurring singly after a vowel were softened to fricatives, leading to an alternation that was often later phonemicized as a result of the loss of gemination.
In languages exhibiting pharyngealization of emphatics, the original velar emphatic has rather developed to a uvular stop [q].
Note: the fricatives *s, *z, *ṣ, *ś, *ṣ́, *ṱ may also be interpreted as affricates (/t͡s/, /d͡z/, /t͡sʼ/, /t͡ɬ/, /t͡ɬʼ/, /t͡θʼ/).
The following table shows the development of the various fricatives in Hebrew, Aramaic and Arabic through cognate words:
Proto-Semitic vowels are, in general, harder to deduce due to the nonconcatenative morphology of Semitic languages. The history of vowel changes in the languages makes drawing up a complete table of correspondences impossible, so only the most common reflexes can be given:
The Semitic languages share a number of grammatical features, although variation — both between separate languages, and within the languages themselves — has naturally occurred over time.
The reconstructed default word order in Proto-Semitic is verb–subject–object (VSO), possessed–possessor (NG), and noun–adjective (NA). This was still the case in Classical Arabic and Biblical Hebrew, e.g. Classical Arabic رأى محمد فريدا ra'ā muħammadun farīdan. (literally "saw Muhammad Farid", Muhammad saw Farid). In the modern Arabic vernaculars, however, as well as sometimes in Modern Standard Arabic (the modern literary language based on Classical Arabic) and Modern Hebrew, the classical VSO order has given way to SVO. Modern Ethiopian Semitic languages follow a different word order: SOV, possessor–possessed, and adjective–noun; however, the oldest attested Ethiopian Semitic language, Ge'ez, was VSO, possessed–possessor, and noun–adjective. Akkadian was also predominantly SOV.
The proto-Semitic three-case system (nominative, accusative and genitive) with differing vowel endings (-u, -a -i), fully preserved in Qur'anic Arabic (see ʾIʿrab), Akkadian and Ugaritic, has disappeared everywhere in the many colloquial forms of Semitic languages. Modern Standard Arabic maintains such case distinctions, although they are typically lost in free speech due to colloquial influence. An accusative ending -n is preserved in Ethiopian Semitic. The archaic Samalian dialect of Old Aramaic reflects a case distinction in the plural between nominative -ū and oblique -ī (compare the same distinction in Classical Arabic). Additionally, Semitic nouns and adjectives had a category of state, the indefinite state being expressed by nunation.
Semitic languages originally had three grammatical numbers: singular, dual, and plural. Classical Arabic still has a mandatory dual (i.e. it must be used in all circumstances when referring to two entities), marked on nouns, verbs, adjectives and pronouns. Many contemporary dialects of Arabic still have a dual, as in the name for the nation of Bahrain (baħr "sea" + -ayn "two"), although it is marked only on nouns. It also occurs in Hebrew in a few nouns (šana means "one year", šnatayim means "two years", and šanim means "years"), but for those it is obligatory. The curious phenomenon of broken plurals – e.g. in Arabic, sadd "one dam" vs. sudūd "dams" – found most profusely in the languages of Arabia and Ethiopia, may be partly of proto-Semitic origin, and partly elaborated from simpler origins.
All Semitic languages show two quite distinct styles of morphology used for conjugating verbs. Suffix conjugations take suffixes indicating the person, number and gender of the subject, which bear some resemblance to the pronominal suffixes used to indicate direct objects on verbs ("I saw him") and possession on nouns ("his dog"). So-called prefix conjugations actually takes both prefixes and suffixes, with the prefixes primarily indicating person (and sometimes number or gender), while the suffixes (which are completely different from those used in the suffix conjugation) indicate number and gender whenever the prefix does not mark this. The prefix conjugation is noted for a particular pattern of ʔ- t- y- n- prefixes where (1) a t- prefix is used in the singular to mark the second person and third-person feminine, while a y- prefix marks the third-person masculine; and (2) identical words are used for second-person masculine and third-person feminine singular. The prefix conjugation is extremely old, with clear analogues in nearly all the families of Afroasiatic languages (i.e. at least 10,000 years old). The table on the right shows examples of the prefix and suffix conjugations in Classical Arabic, which has forms that are close to Proto-Semitic.
In Proto-Semitic, as still largely reflected in East Semitic, prefix conjugations are used both for the past and the non-past, with different vocalizations. Cf. Akkadian niprus "we decided" (preterite), niptaras "we have decided" (perfect), niparras "we decide" (non-past or imperfect), vs. suffix-conjugated parsānu "we are/were/will be deciding" (stative). Some of these features, e.g. gemination indicating the non-past/imperfect, are generally attributed to Afroasiatic. According to Hetzron, Proto-Semitic had an additional form, the jussive, which was distinguished from the preterite only by the position of stress: the jussive had final stress while the preterite had non-final (retracted) stress.
The West Semitic languages significantly reshaped the system. The most substantial changes occurred in the Central Semitic languages (the ancestors of modern Hebrew, Arabic and Aramaic). Essentially, the old prefix-conjugated jussive or preterite became a new non-past (or imperfect), while the stative became a new past (or perfect), and the old prefix-conjugated non-past (or imperfect) with gemination was discarded. New suffixes were used to mark different moods in the non-past, e.g. Classical Arabic -u (indicative), -a (subjunctive), vs no suffix (jussive). (It is not generally agreed whether the systems of the various Semitic languages are better interpreted in terms of tense, i.e. past vs. non-past, or aspect, i.e. perfect vs. imperfect.) A special feature in classical Hebrew is the waw-consecutive, prefixing a verb form with the letter waw in order to change its tense or aspect. The South Semitic languages show a system somewhere between the East and Central Semitic languages.
Later languages show further developments. In the modern varieties of Arabic, for example, the old mood suffixes were dropped, and new mood prefixes developed (e.g. bi- for indicative vs. no prefix for subjunctive in many varieties). In the extreme case of Neo-Aramaic, the verb conjugations have been entirely reworked under Iranian influence.
All Semitic languages exhibit a unique pattern of stems called Semitic roots consisting typically of triliteral, or three-consonant consonantal roots (two- and four-consonant roots also exist), from which nouns, adjectives, and verbs are formed in various ways (e.g., by inserting vowels, doubling consonants, lengthening vowels or by adding prefixes, suffixes, or infixes).
For instance, the root k-t-b, (dealing with "writing" generally) yields in Arabic:
and the same root in Hebrew: (A line under k and b mean a fricitive, x for k and v for b.)
In Tigrinya and Amharic, this root survives only in the noun kitab, meaning "amulet", and the verb "to vaccinate". Ethiopic-derived languages use different roots for things that have to do with writing (and in some cases counting) primitive root: ṣ-f and trilateral root stems: m-ṣ-f, ṣ-h-f, and ṣ-f-r are used. This roots also exists in other Semitic languages like (Hebrew: sep̄er "book", sōp̄er "scribe", mispār "number" and sippūr "story"). (this root also exists in Arabic and is used to form words with a close meaning to "writing", such as ṣaḥāfa "journalism", and ṣaḥīfa "newspaper" or "parchment"). Verbs in other non-Semitic Afroasiatic languages show similar radical patterns, but more usually with biconsonantal roots; e.g. Kabyle afeg means "fly!", while affug means "flight", and yufeg means "he flew" (compare with Hebrew, where hap̄lēḡ means "set sail!", hap̄lāḡā means "a sailing trip", and hip̄līḡ means "he sailed", while the unrelated ʕūp̄, təʕūp̄ā and ʕāp̄ pertain to flight).
These are the basic numeral stems without feminine suffixes. Note that in most older Semitic languages, the forms of the numerals from 3 to 10 exhibit gender polarity (also called "chiastic concord" or reverse agreement), i.e. if the counted noun is masculine, the numeral would be feminine and vice versa.
Due to the Semitic languages' common origin, they share some words and roots. Others differ. For example:
Terms given in brackets are not derived from the respective Proto-Semitic roots, though they may also derive from Proto-Semitic (as does e.g. Arabic dār, cf. Biblical Hebrew dōr "dwelling").
Sometimes, certain roots differ in meaning from one Semitic language to another. For example, the root b-y-ḍ in Arabic has the meaning of "white" as well as "egg", whereas in Hebrew it only means "egg". The root l-b-n means "milk" in Arabic, but the color "white" in Hebrew. The root l-ḥ-m means "meat" in Arabic, but "bread" in Hebrew and "cow" in Ethiopian Semitic; the original meaning was most probably "food". The word medina (root: m-d-n) has the meaning of "metropolis" in Amharic, "city" in Arabic and Ancient Hebrew, and "State" in Modern Hebrew.
Of course, there is sometimes no relation between the roots. For example, "knowledge" is represented in Hebrew by the root y-d-ʿ, but in Arabic by the roots ʿ-r-f and ʿ-l-m and in Ethiosemitic by the roots ʿ-w-q and f-l-ṭ.
There are six fairly uncontroversial nodes within the Semitic languages: East Semitic, Northwest Semitic, North Arabian, Old South Arabian (also known as Sayhadic), Modern South Arabian, and Ethiopian Semitic. These are generally grouped further, but there is ongoing debate as to which belong together. The classification based on shared innovations given below, established by Robert Hetzron in 1976 and with later emendations by John Huehnergard and Rodgers as summarized in Hetzron 1997, is the most widely accepted today. In particular, several Semiticists still argue for the traditional (partially nonlinguistic) view of Arabic as part of South Semitic, and a few (e.g. Alexander Militarev or the German-Egyptian professor Arafa Hussein Mustafa) see the South Arabian languages[clarification needed] as a third branch of Semitic alongside East and West Semitic, rather than as a subgroup of South Semitic. However, a new classification groups Old South Arabian as Central Semitic instead.
Roger Blench notes that the Gurage languages are highly divergent and wonders whether they might not be a primary branch, reflecting an origin of Afroasiatic in or near Ethiopia. At a lower level, there is still no general agreement on where to draw the line between "languages" and "dialects" – an issue particularly relevant in Arabic, Aramaic, and Gurage – and the strong mutual influences between Arabic dialects render a genetic subclassification of them particularly difficult.
A computational phylogenetic analysis by Kitchen, et al. (2009) considers the Semitic languages to have originated in the Levant about 5,750 years ago during the Early Bronze Age, with early Ethiosemitic originating from southern Arabia approximately 2,800 years ago.
The following is a list of some modern and ancient Semitic-speaking peoples and nations: