Title: Historical Linguistics
 1Historical Linguistics
- Language history 
- drift change by internal development 
- contact change by external borrowing 
- Possible relations among languages 
- family tree 
- similarity due to separate development from 
 common ancestor
- diffusion of traits 
- similarity due to borrowing in period of contact 
- or, no provable relationship 
- Tasks of historical linguistics 
- inference of historical connections 
- reconstruction of proto languages 
2Colonial Philology
- Thomas Jefferson corresponded with many sources 
 to obtain word lists in Indian languages
- Examined and compared the results of Peter the 
 Greats Siberian expeditions
- Benjamin Franklin also collected Indian word lists
3How many ages have elapsed since the English, 
Dutch, the Germans, the Swiss, the Norwegians, 
Danes and Swedes have separated from their common 
stock? Yet how many more must elapse before the 
proofs of their common origin, which exist in 
their several languages, will disappear? It is to 
be lamented then  that we have suffered so many 
of the Indian tribes already to extinguish, 
without our having previously collected and 
deposited in the records of literature, the 
general rudiments at least of the languages they 
spoke. Were vocabularies formed of all the 
languages spoken in North and South America, 
preserving their appellations of the most common 
objects in nature, of those which must be present 
to every nation barbarous or civilised, with the 
inflections of their nouns and verbs, their 
principles of regimen and concord, and these 
deposited in all the public libraries, it would 
furnish opportunities to those skilled in the 
languages of the old world to compare them with 
these, now or at a future time, and hence to 
construct the best evidence of the derivation of 
this part of the human race.
Thomas Jefferson, Notes on the State of 
Virginia. Written1781-82. 
 4Benjamin Barton sees a pattern
By a careful inspection of the vocabularies, the 
reader will find no difficulty in discovering 
that in Asia the languages of the  tribes of the 
Delaware-stock may be all traced to ONE COMMON 
SOURCE. Nor do I limit this observation to the 
languages of the American tribes just mentioned 
HITHERTO, WE HAVE NOT DISCOVERED IN AMERICA ANY 
TWO, OR MORE LANGUAGES BETWEEN WHICH WE ARE 
INCAPABLE OF DETECTING AFFINITIES (AND THOSE VERY 
OFTEN STRIKING) EITHER IN AMERICAN, OR IN THE OLD 
WORLD.
New Views of the Origin of the Tribes and Nations 
of America Benjamin Smith Barton M.D., Professor 
of Materia Medica, Natural History and Botany, 
in the University of Pennsylvania (1798) 
 5Bartons hypothesis
My inquiries seem to render it probable, that 
all the languages of the countries of America 
may  be traced to one or two great stocks 
 6Jefferson disagreed
imperfect as is our knowledge of the tongues 
spoken in America, it suffices to discover the 
following remarkable fact. Arranging them under 
the radical ones to which they may be palpably 
traced, and doing the same by those of the red 
men of Asia, there will be found probably twenty 
in America, for one in Asia, of those radical 
languages, so called because, if they were ever 
the same, they have lost all resemblance to one 
another. A separation into dialects may be the 
work of a few ages only, but for two dialects to 
recede from one another till they have lost all 
vestiges of their common origin, must require an 
immense course of time perhaps not less than 
many people give to the age of the earth. A 
greater number of those radical changes of 
language having taken place among the red men of 
America, proves them of greater antiquity than 
those of Asia. 
Notes on the State of Virginia Written 1781-82 
 7though later, J. considered a sociolinguistic 
explanation
Having heard that some Indians considered it 
 dishonorable to use any language but their 
own, he suggested that when a part of a 
tribe separated itself, the seceded group might 
refuse to use the original language and invent 
their own.
Perhaps this hypothesis presents less 
difficulty than that of so many radically 
distinct languages preserved by such handfuls of 
men from an antiquity so remote that no data we 
possess will enable us to calculate it. Ms. 
notes circa 1800 
 8Jeffersons plans
- By 1801, he had collected vocabularies for dozens 
 of indigenous languages
- and began to arrange this for publication lest 
 by some accident it might be lost
- He put off publication in 1803 
- due to the opportunity to include the results of 
 the Lewis  Clark expedition
9The sad end of J.s linguistic career
- His linguistic papers were packed in a large 
 trunk and shipped back to Monticello in 1809 with
 his other effects
- The trunk was stolen during the trip up the James 
 River
- The disappointed thief dumped the contents in the 
 river
- Only a few items floated to shore and were 
 recovered
10Jefferson to Barton (1809),sent with Lewis 
vocabulary of Pani
It is a specimen of the condition of the little 
that was recovered. I am the more concerned at 
this accident, as of the two hundred and fifty 
words of my vocabularies, and the one hundred and 
thirty words of the great Russian vocabularies  
seventy three were common to both, and would 
have furnished materials from which something 
might have resulted. Perhaps I may make another 
attempt to collect, although I am too old to 
expect to make much progress in it. 
 11Sir William (Oriental) Jones
- Lawyer appointed in 1783 to superintend British 
 jurisprudence in India
- Founded the Asiatic Society in Calcutta for 
 Inquiring into the History, Civil and Natural,
 the Antiquities, Arts, Sciences, and Literature,
 of Asia
- Learned Sanskrit because the laws of the natives 
 must be preserved inviolate but the learning and
 vigilance of the English judge must be a check
 upon the native interpreters
12- One of the early European orientalists 
- Cross-cultural pioneers? 
- Agents of colonial domination? 
13Historical Context
- The British in India 
- piecemeal conquest 1750-1900 
- began with trade concessions in Calcutta and 
 Bombay
- expanded one principality at a time 
- mixture of direct and indirect rule 
- many Indian institutions left in place 
- rule mainly administered and enforced by Indians 
- until 1850s, administration was in the hands of 
 the East India Company rather than the British
 Crown
14India in 1785 
 15Jones learns Sanskrit (1783-1786)
- Sanskrit 
- Language of Hindu holy texts (1000 BC) 
- Formalized by grammarians c. 600 BC 
- Preserved to the present day as a language of 
 religion and learning
- No Brahman would teach a foreigner 
- Jones hired a vaidya (doctor) as tutor while the 
 Brahmanic scholars were away on a religious
 retreat
16Jones Third Discourse (1786)
- Anniversary addresses to the Asiatic Society 
- First Discourse purposes and procedures of the 
 Society
- Second Discourse a detailed research program 
- Third Discourse on the nations of Asia
The five principal nations, who have in different 
ages divided among themselves, as a kind of 
inheritance, the vast continent of Asia, with the 
many islands depending on it, are the Indians, 
the Chinese, the Tartars, the Arabs, and the 
Persians who they severally were, whence and 
when they came, where they now are settled, and 
what advantage a more perfect knowledge of them 
all may bring to our European world, will be 
shown, I trust, in five distinct essays the last 
of which will demonstrate the connexion or 
diversity between then, and solve the 
great problem, whether they had any common 
origin, and whether that origin was the same, 
which we generally ascribe to them. 
 17The Indo-European Hypothesis
The Sanskrit language, whatever be its antiquity, 
is of a wonderful structure more perfect than 
the Greek more copious than the Latin, and more 
exquisitely refined than either, yet bearing to 
both of them a stronger affinity, both in the 
roots of verbs and in the forms of grammar, than 
could possibly have been produced by accident so 
strong indeed, that no philologer could examine 
them all three, without believing them to have 
sprung from some common source, which, perhaps, 
no longer exists there is a similar reason, 
though not quite so forcible, for supposing that 
both the Gothick and the Celtick, though blended 
with a very different idiom, had the same origin 
with the Sanskrit, and the old Persian might be 
added to the same family. 
 18Jones American connection
- Jones was a radical Whig and an early political 
 supporter of the American Revolution
- Met Benjamin Franklin at the RS in 1771 
- Visited Franklin in Paris in 1779, 1780, and 1782 
- To explore compromise peace plans 
- To deal with a clients property claims in 
 Virginia
- To obtain a pass for travel to America 
- considered emigration to Charleston or 
 Philadelphia!
- Many weeks of political and philosophical 
 conversations
- Indirect communication with Jefferson 
-  Relations to the Virginia manuscript?
19Indo-European Examples
English Latin Greek Sanskrit
father pater patêr pitar
brother frater phrater (fellow tribesman) bhratar
two duo duo dva
three tres treis tryas
four quattuor tettares catvaras
seven septem hepta sapta 
 20Jones methods
- Analyst must be perfectly acquainted with the 
 languages compared
- Meanings of proposed cognates must be nearly 
 identical
- Vowels should not be disregarded 
- No metathesis or unexplained consonant insertions 
- Transliterations must be systematic and careful 
- Use basic vocabulary, not exotic words more 
 likely to be borrowed
21Remember Barton
- By a careful inspection of the vocabularies, the 
 reader will find no difficulty in discovering
 that in Asia the languages of the  tribes of the
 Delaware-stock may be all traced to ONE COMMON
 SOURCE. Nor do I limit this observation to the
 languages of the American tribes just mentioned
 HITHERTO, WE HAVE NOT DISCOVERED IN AMERICA ANY
 TWO, OR MORE LANGUAGES BETWEEN WHICH WE ARE
 INCAPABLE OF DETECTING AFFINITIES (AND THOSE VERY
 OFTEN STRIKING) EITHER IN AMERICAN, OR IN THE OLD
 WORLD.
New Views of the Origin of the Tribes and Nations 
of America Benjamin Smith Barton M.D., Professor 
of Materia Medica, Natural History and Botany, 
in the University of Pennsylvania (1798) 
 22imperfect as is our knowledge of the tongues 
spoken in America, it suffices to discover the 
following remarkable fact. Arranging them under 
the radical ones to which they may be palpably 
traced, and doing the same by those of the red 
men of Asia, there will be found probably twenty 
in America, for one in Asia, of those radical 
languages, so called because, if they were ever 
the same, they have lost all resemblance to one 
another. A separation into dialects may be the 
work of a few ages only, but for two dialects to 
recede from one another till they have lost all 
vestiges of their common origin, must require an 
immense course of time perhaps not less than 
many people give to the age of the earth. A 
greater number of those radical changes of 
language having taken place among the red men of 
America, proves them of greater antiquity than 
those of Asia. 
Notes on the State of Virginia, 1787 
 23The controversy continues
- (Like Barton) Joseph Greenberg (1987) 
- All American languages in three groups 
- Eskimo-Aleut 
- Na-Dene 
- Amerind 
- (Like Jefferson) Other scholars 
- The Amerind category is a fiction 
- There are 
- 60 unrelated families in N. America 
- 19 unrelated families in C. America 
- 80 unrelated families in S. America
24Different methods
- Mass comparison 
- Cognate ratios (lexicostatistics) 
- Glottochronology 
- Typological features 
- e.g. classifier systems 
- Comparative reconstruction 
- Determination of systematic sound laws 
- Lexical and morphological reconstruction
25Laws of sound change
- Meaning change is usually sporadic 
- Sound change is usually systematic, e.g. 
- t/d deletion (best, past, lost, etc.) 
- short a raising (camera, man, vanish, etc.) 
- Neogrammarian hypothesis (1870) 
- All sound change is systematic 
- Apparent exceptions analysis is incomplete 
- Article of faith with scholars known asthe 
 young grammarians
26Grimms Law
- Jakob Grimm (1822) 
- Gradation of consonant manner 
- bh dh gh -gt b d g 
- b d g -gt p t k 
- p t k -gt f th h 
-  pater father labium lip 
-  tres three duo two 
-  canis hound ager acre 
-  bhratar brother 
-  dha do 
-  vah wagon 
27Verners Law
- Karl Adolf Verner (1875) 
- Fixes gaps in Grimms Law 
- voicing after accentless vowels 
- applies to non-Grimms Law cases as well 
- from PIE to Gothic in four algorithmic steps 
-  PIE p_at_tér 
-  GL f_at_thér 
-  (vowels) fathár 
-  VL fadár 
-  AS fádar 
28More on sound change
- Well attested in recent history 
- I.e. English Great Vowel Shift 
-  Can study sound change in progress today 
-  Tends to produce tree-like histories. 
-  operates on the system as a whole 
-  isnt easily borrowed across languages
29Problems with comparative reconstruction
- Requires detailed knowledge of languages involved 
- Must be enough cognates for patterns to emerge 
- and layers of borrowing to be identified and 
 discarded
- Maximum time depth of 5-10K years 
- (Jefferson was right)
30Cognate percentages
- Catherine the Greats method 
- make a list of appellations of the most common 
 objects in nature, of those which must be present
 to every nation barbarous or civilised
- Standard lists devised by Morris Swadesh around 
 1950
- For each pair of languages, estimate the 
 proportion of cognate words
- Raw result is a table of percentages 
- like a table of trip distances 
31Example
Central Yambasa languages (Cameroon) 
 32Questions about lexicostatistics
- Genetic descent vs. borrowing 
- borrowing creates non-tree structures 
- Variability of rate of change 
- Swadesh 14 per millenium 
- Expected rate of false cognates 
- How to combine with other evidence 
- Inference of tree structure 
- from cognate percentages 
- from detailed account of shared traits
33Historical inferencefrom linguistic and genetic 
data
- Potentially the best evidence of the derivation 
 of  the human race  (Thomas
 Jefferson)
- BUT 
- Inferences are complex 
- methods and results from several disciplines 
- Intellectual stakes are high 
- Work has often been careless 
- sometimes spectacularly so 
- dangers of overinterpretation and scientism
34General methodological problems
- Not all graphs are trees 
- treeness tests often left out 
- treeness hypothesis can often be rejected 
- Tree inference may be underdetermined 
- Branching structure 
- Root choice 
- Rates of change may not be constant 
- for different markers 
- across time 
- Gene trees (and language trees) may not be 
 population trees
- Biology and language are complicated 
- simplifying assumptions are sometimes 
 perniciously mistaken
35Trees vs. Clines (etc.)
- A tree structure represents the results of a 
 sequence of splits in population (or language)
- no further influences among separate branches 
- if rates of change are constant, distances should 
 be quantized
- Within an interbreeding (intercommunicating) 
 population, distances reflect the amount of gene
 flow (transmission of linguistic traits)
- should correlate strongly with accessibility 
- e.g. geographical distance in the simplest case
36(No Transcript) 
 37The procedures outlined here provide a rigorous 
method for inferring whether the geographical 
pattern of variation is consistent with an 
historical split (fragmentation) or no 
split(recurrent gene flow) using criteria that 
are completely explicit. For example, in 
analyzing the mtDNA of tiger salamanders, a clear 
split into eastern and western lineages was 
detected for mtDNA. Using the same explicit 
criteria, there was no split among any human 
populations. Quite the contrary, the present 
analysis documents recurrent and continual 
genetic interchange among all Old World human 
populations throughout the entire time period 
marked by mt DNA. Accordingly, estimating a date 
for a 'split' of Africans from non-Africans based 
on evidence from mtDNA is certainly allowed by 
many computer programs, but the results are 
meaningless because a date is being assigned to 
an 'event' that never occurred. Templeton 
(1997) 
 38Methods for tree inference(phylogeny)
- Two general approaches 
- clustering (easier but cruder) 
- generate and evaluate alternative trees 
- Distance-based methods 
- based on matrix of distances/similarities 
- Parsimony 
- based on set of partly-shared characters or 
 traits
- http//evolution.genetics.washington.edu/phylip/so
 ftware.html
-  documents 193 different phylogeny packages 
39Cognate percentagesfor 8 Vanuatu languages
 Toga 64 Mosina 64 58 Peterara 57 51 
65 Nduindui 29 28 34 32 Sakao 51 45 55 
 52 40 Malo 39 39 45 41 43 50 Fortsenal 
 52 48 57 60 31 48 45 Raga 
Data from Guy (1994) 
 40Reconstruction Algorithm(Guy 1994)
A message is input at the root of a tree-shaped 
transmission network, whence it is transmitted to 
the terminal nodes. As they travel, copies of the 
original message are affected by errors 
consisting in randomly selected segments of the 
message being replaced by other segments randomly 
drawn from a pool of possible segments (the 
"alphabet of the message). The problem is from 
the garbled versions of the original message 
collected at the terminal nodes, reconstruct the 
network and the history of the transmission of 
the message. 
Additive-distance tree with weights on branches 
ratherthan on nodes -- doesnt assume constant 
rate of change 
 41Explanatory force of the model
- Set of distances grows as 
- Set of binary-tree branch labels grows 
 as
- For 8 languages we predict 28 numbers (the 
 inter-language cognate proportions) with 14
 numbers (the binary tree branch proportions)
 
42Inferred tree
  Toga -830------919------972------947----
- Mosina -770-----'   
  Peterara -----829-----------'  
  Nduindui -----795------------949-----' 
  Raga -----755-----------' 
  Sakao -----567------------883------89
5-----' Fortsenal -----759-----------'  
 Malo ----------772----------------'  
Mosina/Toga .77.83  .6391 (really 
64) Peterara/Mosina .829.919.77  .5866 
(really 58) Peterara/Toga .829.919.830  
.6323 (really 64)
from Guy (1994) 
 43True - predictedcognate percentages
  Toga 0 Mosina 
 1 -1 Peterara 1 -1 4 
Nduindui -2 -1 0 0 Sakao 
 2 0 2 3 1 Malo -3 0 
-1 -2 0 -2 Fortsenal -1 -1 -1 
0 1 1 4 Raga   
The model fits very well! 
 44Wheres the root?
Isnt it obvious?
  Toga -830------919------972------947----
---Protolanguage Mosina -770-----'  
   Peterara -----829-----------' 
   Nduindui -----795-----------
-949-----'  Raga -----755-----------
'  Sakao 
-----567------------883------895-----' Fortsenal
 -----759-----------'  Malo 
----------772----------------'   
 45Oops other options
protolanguage
  Toga -830------919------972------947----
- Mosina -770-----'   
  Peterara -----829-----------'  
  Nduindui -----795------------949-----' 
  Raga -----755-----------' 
  Sakao -----567------------883------89
5-----' Fortsenal -----759-----------'  
 Malo ----------772----------------'   
 46And some more
protolanguage
  Toga -830--919--972--947--895--883--5
67- Sakao Mosina -770-'   
 -759- Fortsenal Peterara -----829---' 
 ---772----- Malo Nduindui 
-----795----949-' Raga -----755---' 
In the absence of other constraints, the root can 
be placed anywhere in the tree without changing 
the models fit! 
 47Possible other constraints
- Historical evidence 
- about earlier forms 
- about structure of relationships among 
 contemporary forms
- outgroup 
- Constraints on rate of change 
- linguistic (or genetic) clock
48A universal constantfor glottochronology?
Thirteen sets of data, presented in partial 
justification of these assumptions, serve as a 
basis for calculating a universal constant to 
express the average rate of retention k of the 
basic-root morphemes k  0.8048  0.0176 
per millennium, with a confidence limit 
of 90. 
Lees (1953) 
 49Some of Lees data
Language Years Words Cognates Rate (per millenium)
English 1000 209 160 .766
Latin/Spanish 1800 200 131 .790
Latin/French 1850 200 125 .776
German 1100 214 180 .854
Middle Egyptian/Coptic 2200 200 106 .760
Greek 2070 213 147 .836
Chinese 1000 210 167 .795
Swedish 1050 207 176 .853 
 50Some more retentive languages(rates per 1000 
years)
Language 100-word list 200-word list
Icelandic (rural) 99 97.6
Icelandic (urban) 98 96.2
Georgian 96.5 89.9
Amenian 97.8 94
Bergsland  Vogt (1962) 
 51Some less retentive ones
Bergsland  Vogt estimate of vocabulary retention 
in East Greenlandic as .722 in 600 years, or .34 
per millenium.
David Lithgow (pers. com. circa 1970) has 
observed a replacement of some 20 of the basic 
vocabulary in Muyuw (Woodlark island) in one 
generation. Raise 0.8 to the 33rd power, and that 
gives you the retention rate of Muyuw per 1000 
years should it continue to evolve at that rate 
0.06.
Jacques Guy (1994) 
 52Language chains
 A .77 B .65 .76 C
Configurations like this are taken as prima facie 
evidence of non-treeness, to be attributed to 
borrowing/mixing/cline types of situations. But 
in fact they can also easily be generated by 
variable rates of change
 A ----------- 90 -----------. 
 ____ protolanguage B ---- 95 ----. 
  ---- 90 ----' C 
---- 80 ----' 
Note that the required difference in mean rate of 
change is only (.9-.9.8)/.9  .2 , or 20 
 53Mitochondrial Genome 
 54Mitochondrial family tree 
 55Mitochondrial phylogeny 
 56Three fascinating results
- Mitochrondrial Eve 
- Mitochrondial Clans 
- The three-wave theory converging linguistic and 
 genetic evidence
57Mitochondrial Eve
- Cann, Stoneking, and Wilson (1987) 
-  mtDNA comparisons of 147 people from Europe, 
 Africa, Asia, Australia, and new Guinea show that
 all present human mtDNA is descended from a
 single African woman who lived about 200,000
 years ago.
58First problem
- Computer program was used to find a tree 
 consistent with the mtDNA data
- But so were many other (unreported) trees! 
- order of answers depended on order of data 
- root could be effectively anywhere in the dataset 
- e.g. Melanesian Eve, Asian Eve, European Eve
59Other problems
- mtDNA may not change at a constant rate 
- mtDNA changes may be adaptive 
- Gene trees may not be population trees 
- DNA (including mtDNA) can spread by gradual flow 
 or by range expansion
- spread can be influenced by other factors
60Early results Native Americans come from four 
genetic lineages, labeled A through D. Amerinds 
have all four lineages, NaDene only A, and 
Eskaleuts A and D. Current results The four 
mtDNA lineages divide into nine distinct genetic 
subtypes. All four lineages are in all three 
language groups. Many local populations have 
all four lineages and a number even have all the 
subtypes. All subtypes can be found in North, 
Central and South America. It isn't realistic 
to believe that the same lineages ended up in all 
these populations across two continents by 
separate migrations." 
 61http//www.oxfordancestors.com/ Oxford 
Ancestors We put the Genes in Genealogy Oxford 
Ancestors is the World's first organization to 
harness the power and precision of modern DNA- 
based genetics in the service of 
genealogy. MatriLine interprets your deep 
maternal ancestry, linking you - if your roots 
are in Europe - to one of seven women Ursula, 
Tara, Helena, Katrine, Velda, Xenia or Jasmine. 
 62(No Transcript) 
 63And MtDNA inheritance may not even be entirely 
clonal!
- Mice 
- demonstration of paternal leakage 
- Hagelberg 
- rare mtDNA mutation in Vanuatu 
- Erye-Walker 
- statistics of mtDNA homoplasies 
64Island evidence
- Erika Hagelberg (Proc. R. Soc. 1999) 
- Island of Nguna (Vanuatu, Melanesia) 
- 3 main MtDNA population groups 
- as expected for the region 
- In all three groups, the same mutation is 
 sometimes found
- previously known only from one Northern European 
- Repeated chance mutation is unlikely 
- local spread by recombination seems more probable 
65Statistics of mtDNA homoplasies
- Mutations that occur in different mtDNA 
 haplogroups around the world
- Assuming purely maternal inheritance, these were 
 thought to represent chance recurrence of
 mutations in hypervariable regions
- Eyre-Walker et al. (Proc. R. Soc. 1999) 
- regions are not statistically more variable than 
 others
- mutations cluster geographically 
- MacCauley (1999) counters 
- much of the result comes from a dataset that may 
 be errorful
- no need to panic
66Reaction of another mtDNA afficionado
I am reminded of a comment by a bishops wife in 
Victorian England, also concerning human origins 
 Let us hope that it isnt true, and if it is, 
that it will not become generally known.