Centum and satem languages

Languages of the Indo-European family are classified as either centum languages or satem languages according to how the dorsal consonants (sounds of "K" and "G" type) of the reconstructed Proto-Indo-European language (PIE) developed. An example of the different developments is provided by the words for "hundred" found in the early attested Indo-European languages. In centum languages, they typically began with a /k/ sound (Latin centum was pronounced with initial /k/), but in satem languages, they often began with /s/ (the example satem comes from the Avestan language of Zoroastrian scripture).

The table below shows the traditional reconstruction of the PIE dorsal consonants, with three series, but according to some more recent theories there may actually have been only two series or three series with different pronunciations from those traditionally ascribed. In centum languages, the palatovelars, which included the initial consonant of the "hundred" root, merged with the plain velars. In satem languages, they remained distinct, and the labiovelars merged with the plain velars.[1]

The centum–satem division forms an isogloss in synchronic descriptions of Indo-European languages. It is no longer thought that the Proto-Indo-European language split first into centum and satem branches from which all the centum and all the satem languages, respectively, would have derived. Such a division is made particularly unlikely by the discovery that while the satem group lies generally to the east and the centum group to the west, the most eastward of the known IE language branches, Tocharian, is centum.[2]

The canonical centum languages of the Indo-European family are the "western" branches: Hellenic, Celtic, Italic and Germanic. They merged Proto-Indo-European palatovelars and plain velars, yielding plain velars only ("centumisation") but retained the labiovelars as a distinct set.[1]

The Anatolian branch probably falls outside the centum–satem dichotomy; for instance, Luwian indicates that all three dorsal consonant rows survived separately in Proto-Anatolian.[3] The centumisation observed in Hittite is therefore assumed to have occurred only after the breakup of Proto-Anatolian.[4] However, Craig Melchert proposes that proto-Anatolian is indeed a centum language.

While Tocharian is generally regarded as a centum language,[5] it is a special case, as it has merged all three of the PIE dorsal series (originally nine separate consonants) into a single phoneme, *k. According to some scholars, that complicates the classification of Tocharian within the centum–satem model.[6] However, as Tocharian has replaced some Proto-Indo-European labiovelars with the labiovelar-like, non-original sequence *ku; it has been proposed that labiovelars remained distinct in Proto-Tocharian, which places Tocharian in the centum group (assuming that Proto-Tocharian lost palatovelars while labiovelars were still phonemically distinct).[5]

In the centum languages, PIE roots reconstructed with palatovelars developed into forms with plain velars. For example, in the PIE root *ḱm̥tóm, "hundred", the initial palatovelar * became a plain velar /k/, as in Latin centum (which was originally pronounced with /k/ in spite of various contemporary pronunciations with /s/, for example), Greek (he)katon, Welsh cant, Tocharian B kante. In the Germanic languages, the /k/ developed regularly by Grimm's law to become /h/, as in the English hund(red).

Centum languages also retained the distinction between the PIE labiovelar row (*, *, *gʷʰ) and the plain velars. Historically, it was unclear whether the labiovelar row represented an innovation by a process of labialisation, or whether it was inherited from the parent language (but lost in the satem branches); current mainstream opinion favours the latter possibility. Labiovelars as single phonemes (for example, /kʷ/) as opposed to biphonemes (for example, /kw/) are attested in Greek (the Linear B q- series), Italic (Latin qu), Germanic (Gothic hwair ƕ and qairþra q) and Celtic (Ogham ceirt Q) (in the so-called P-Celtic languages /kʷ/ developed into /p/; a similar development sometimes took place in Greek). The boukólos rule, however, states that a labiovelar reduces to a plain velar when it occurs next to *u or *w.

The centum–satem division refers to the development of the dorsal series at the time of the earliest separation of Proto-Indo-European into the proto-languages of its individual daughter branches. It does not apply to any later analogous developments within any individual branch. For example, the conditional palatalization of Latin /k/ to /t͡ʃ/ or /t͡s/ (often later /s/) in some Romance languages (which means that modern French cent is pronounced with initial /s/) is satem-like, as is the merger of *kʷ with *k in the Gaelic languages; such later changes do not affect the classification of the languages as centum.

The satem languages belong to the "eastern" sub-families, especially Indo-Iranian and Balto-Slavic (but not Tocharian). It lost the labial element of Proto-Indo-European labiovelars and merged them with plain velars, but the palatovelars remained distinct and typically came to be realised as sibilants.[7] That set of developments, particularly the assibilation of palatovelars, is referred to as satemisation.

In the satem languages, the reflexes of the presumed PIE palatovelars are typically fricative or affricate consonants, articulated further forward in the mouth. For example, the PIE root *ḱm̥tóm, "hundred", the initial palatovelar normally became a sibilant [s] or [ʃ], as in Avestan satem, Persian sad, Sanskrit śatam, сто / sto in all modern Slavic languages, Old Church Slavonic sъto, Latvian simts, Lithuanian šimtas. Another example is the Slavic prefix sъ(n)- ("with"), which appears in Latin, a centum language, as co(n)-; conjoin is cognate with Russian soyuz ("union"). An [s] is found for PIE *ḱ in such languages as Latvian, Avestan, Russian and Armenian, but Lithuanian and Sanskrit have [ʃ] (š in Lithuanian, ś in Sanskrit transcriptions). For more reflexes, see the phonetic correspondences section below; note also the effect of the ruki sound law.

"Incomplete satemisation" may also be evidenced by remnants of labial elements from labiovelars in Balto-Slavic, including Lithuanian ungurys "eel" < *angʷi- and dygus "pointy" < *dʰeigʷ-. A few examples are also claimed in Indo-Iranian, such as Sanskrit guru "heavy" < *gʷer-, kulam "herd" < *kʷel-, but they may instead be secondary developments, as in the case of kuru "make" < *kʷer- in which it is clear that the ku- group arose in post-Rigvedic language. It is also asserted that in Sanskrit and Balto-Slavic, in some environments, resonant consonants (denoted by /R/) become /iR/ after plain velars but /uR/ after labiovelars.

Some linguists argue that the Albanian[8] and Armenian[citation needed] branches are also to be classified as satem,[9] but some linguists argue that they show evidence of separate treatment of all three dorsal consonant rows and so may not have merged the labiovelars with the plain velars, unlike the canonical satem branches.

Assibilation of velars in certain phonetic environments is a common phenomenon in language development (compare, for example, the initial sounds in French cent and Spanish cien, which are fricatives even though they derive from Latin /k/). Consequently, it is sometimes hard to establish firmly the languages that were part of the original satem diffusion and the ones affected by secondary assibilation later. While extensive documentation of Latin and Old Swedish, for example, shows that the assibilation found in French and Swedish were later developments, there are not enough records of Dacian and Thracian to settle conclusively when their satem-like features originated. Extensive lexical borrowing, such as Armenian from Iranian, may also add to the difficulty.

In Armenian, some assert that /kʷ/ is distinguishable from /k/ before front vowels.[10] Martin Macak (2018) asserts that the merger of * and *k occurred "within the history of Proto-Armenian itself".[11]

In Albanian, the three original dorsal rows have remained distinguishable when before historic front vowels.[12][13][14] Labiovelars are for the most part differentiated from all other Indo-European velar series before front vowels (where they developed into s and z ultimately), but they merge with the "pure" (back) velars elsewhere.[12] The palatal velar series, consisting of Proto-Indo-European * and the merged *ģ and ģʰ, usually developed into th and dh, but were depalatalized to merge with the back velars when in contact with sonorants.[12] Because the original Proto-Indo-European tripartite distinction between dorsals is preserved in such reflexes, Demiraj argues Albanian is therefore to be considered neither centum nor satem, like Luwian, but at the same time it has a "satem-like" realization of the palatal dorsals in most cases.[13] Thus PIE *, * and *k become th (Alb. thom "I say" < PIE *ḱeHsmi), s (Alb. si "how" < PIE. kʷiH1, cf. Latin quī), and q (/c/: pleq "elderly" < *plak-i < PIE *plH2-ko-), respectively.[15]

August Schleicher, an early Indo-Europeanist, in Part I, "Phonology", of his major work, the 1871 Compendium of Comparative Grammar of the Indogermanic Language, published a table of original momentane Laute, or "stops", which has only a single velar row, *k, *g, *gʰ, under the name of Gutturalen.[16][17] He identifies four palatals (*ḱ, *ǵ, *ḱʰ, *ǵʰ) but hypothesises that they came from the gutturals along with the nasal *ń and the spirant *ç.[18]

Karl Brugmann, in his 1886 work Outline of Comparative Grammar of the Indogermanic Language (Grundriss...), promotes the palatals to the original language, recognising two rows of Explosivae, or "stops", the palatal (*ḱ, *ǵ, *ḱʰ, *ǵʰ) and the velar (*k, *g, *kʰ, *gʰ),[19] each of which was simplified to three articulations even in the same work.[20] In the same work, Brugmann notices among die velaren Verschlusslaute, "the velar stops", a major contrast between reflexes of the same words in different daughter languages. In some, the velar is marked with a u-Sprache, "u-articulation", which he terms a Labialisierung, "labialization", in accordance with the prevailing theory that the labiovelars were velars labialised by combination with a u at some later time and were not among the original consonants. He thus divides languages into die Sprachgruppe mit Labialisierung[21] and die Sprachgruppe ohne Labialisierung, "the language group with (or without) labialization", which basically correspond to what would later be termed the centum and satem groups:[22]

For words and groups of words, which do not appear in any language with labialized velar-sound [the "pure velars"], it must for the present be left undecided whether they ever had the u-afterclap.

The doubt introduced in that passage suggests he already suspected the "afterclap" u was not that but was part of an original sound.

In 1890, Peter von Bradke published Concerning Method and Conclusions of Aryan (Indogermanic) Studies, in which he identified the same division (Trennung) as did Brugmann, but he defined it in a different way. He said that the original Indo-Europeans had two kinds of gutturaler Laute, "guttural sounds" the gutturale oder velare, und die palatale Reihe, "guttural or velar, and palatal rows", each of which were aspirated and unaspirated. The velars were to be viewed as gutturals in an engerer Sinn, "narrow sense". They were a reiner K-Laut, "pure K-sound". Palatals were häufig mit nachfolgender Labialisierung, "frequently with subsequent labialization". The latter distinction led him to divide the palatale Reihe into a Gruppe als Spirant and a reiner K-Laut, typified by the words satem and centum respectively.[23] Later in the book[24] he speaks of an original centum-Gruppe, from which on the north of the Black and Caspian Seas the satem-Stämme, "satem tribes", dissimilated among the Nomadenvölker or Steppenvölker, distinguished by further palatalization of the palatal gutturals.

By the 1897 edition of Grundriss, Brugmann (and Delbrück) had adopted Von Bradke's view: "The Proto-Indo-European palatals... appear in Greek, Italic, Celtic and Germanic as a rule as K-sounds, as opposed to in Aryan, Armenian, Albanian, Balto-Slavic, Phrygian and Thracian... for the most part sibilants."[25]

There was no more mention of labialized and non-labialized language groups after Brugmann changed his mind regarding the labialized velars. The labio-velars now appeared under that name as one of the five rows of Verschlusslaute (Explosivae) (plosives/stops), comprising die labialen V., die dentalen V., die palatalen V., die reinvelaren V. and die labiovelaren V. It was Brugmann who pointed out that labiovelars had merged into the velars in the satem group,[26] accounting for the coincidence of the discarded non-labialized group with the satem group.

When von Bradke first published his definition of the centum and satem sound changes, he viewed his classification as "the oldest perceivable division" in Indo-European, which he elucidated as "a division between eastern and western cultural provinces (Kulturkreise)".[27] The proposed split was undermined by the decipherment of Hittite and Tocharian in the early 20th century. Both languages show no satem-like assibilation in spite of being located in the satem area.[28]

The proposed phylogenetic division of Indo-European into satem and centum "sub-families" was further weakened by the identification of other Indo-European isoglosses running across the centum–satem boundary, some of which seemed of equal or greater importance in the development of daughter languages.[29] Consequently, since the early 20th century at least, the centum–satem isogloss has been considered an early areal phenomenon rather than a true phylogenetic division of daughter languages.

The actual pronunciation of the velar series in PIE is not certain. One current idea is that the "palatovelars" were in fact simple velars *[k], *[ɡ], *[ɡʰ], and the "plain velars" were pronounced farther back, perhaps as uvular consonants: *[q], *[ɢ], *[ɢʰ].[30] If labiovelars were just labialized forms of the "plain velars", they would have been pronounced *[qʷ], *[ɢʷ], *[ɢʷʰ] but the pronunciation of the labiovelars as *[kʷ], *[gʷ], *[gʷʰ] would still be possible in uvular theory, if the satem languages first shifted the "palatovelars" then later merged the "plain velars" and "labiovelars". The uvular theory is supported by the following evidence.

On the above interpretation, the split between the centum and satem groups would not have been a straightforward loss of an articulatory feature (palatalization or labialization). Instead, the uvulars *q, *ɢ, *ɢʰ (the "plain velars" of the traditional reconstruction) would have been fronted to velars across all branches. In the satem languages, it caused a chain shift, and the existing velars (traditionally "palatovelars") were shifted further forward to avoid a merger, becoming palatal: /k/ > /c/; /q/ > /k/. In the centum languages, no chain shift occurred, and the uvulars merged into the velars. The delabialisation in the satem languages would have occurred later, in a separate stage.

The presence of three dorsal rows in the proto-language has been the mainstream hypothesis since at least the mid-20th century. There remain, however, several alternative proposals with just two rows in the parent language, which describe either "satemisation" or "centumisation", as the emergence of a new phonematic category rather than the disappearance of an inherited one.

Antoine Meillet (1937) proposed that the original rows were the labiovelars and palatovelars, with the plain velars being allophones of the palatovelars in some cases, such as depalatalisation before a resonant.[31] The etymologies establishing the presence of velars in the parent language are explained as artefacts of either borrowing between daughter languages or of false etymologies.

Other scholars who assume two dorsal rows in Proto-Indo-European include Kuryłowicz (1935) and Lehmann (1952), as well as Frederik Kortlandt and others.[32] The argument is that PIE had only two series, a simple velar and a labiovelar. The satem languages palatalized the plain velar series in most positions, but the plain velars remained in some environments: typically reconstructed as before or after /u/, after /s/, and before /r/ or /a/ and also before /m/ and /n/ in some Baltic dialects. The original allophonic distinction was disturbed when the labiovelars were merged with the plain velars. That produced a new phonemic distinction between palatal and plain velars, with an unpredictable alternation between palatal and plain in related forms of some roots (those from original plain velars) but not others (those from original labiovelars). Subsequent analogical processes generalised either the plain or palatal consonant in all forms of a particular root. The roots in which the plain consonant was generalized are those traditionally reconstructed as having "plain velars" in the parent language in contrast to "palatovelars".

Oswald Szemerényi (1990) considers the palatovelars as an innovation, proposing that the "preconsonantal palatals probably owe their origin, at least in part, to a lost palatal vowel" and a velar was palatalised by a following vowel subsequently lost.[33] The palatal row would therefore postdate the original velar and labiovelar rows, but Szemerényi is not clear whether that would have happened before or after the breakup of the parent-language (in a table showing the system of stops "shortly before the break-up", he includes palatovelars with a question mark after them).

Woodhouse (1998; 2005) introduced a "bitectal" notation, labelling the two rows of dorsals as k1, g1, g1h and k2, g2, g2h. The first row represents "prevelars", which developed into either palatovelars or plain velars in the satem group but just into plain velars into the centum group; the second row represents "backvelars", which developed into either labiovelars or plain velars in the centum group but just plain velars in the satem group.[34]

The following are arguments that have been listed in support of a two-series hypothesis:[citation needed]

The following table summarizes the outcomes of the reconstructed PIE palatals and labiovelars in the various daughter branches, both centum and satem. (The outcomes of the "plain velars" can be assumed to be the same as those of the palatals in the centum branches and those of the labiovelars in the satem branches.)