Title: Advances in Automated Language Classification ASJP Consortium Dik Bakker, Lancaster
1Advances inAutomatedLanguageClassificationASJ
P ConsortiumDik Bakker, Lancaster
2Overview
Project ASJP (Automated Similarity Judgment
Program)
3Overview
Project ASJP are Sören Wichmann (BRD
Netherlands) Viveka Velupillai (BRD) André
Müller (BRD) Robert Mailhammer (BRD) Hagen
Jung (BRD) Eric Holman (US) Anthony Grant
(UK) Dmitry Egorov (Russia) Pamela Brown
(US) Cecil Brown (US) Dik Bakker (UK
Netherlands)
4Overview
Project ASJP (Automated Similarity Judgment
Program)
5Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships
6Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance matrix
between individual languages on basis of
linguistic features
7Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance matrix
between individual languages on basis of
linguistic features Method Lexicostatistics
mass comparison of lexical items
8Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals (a.o)
9Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications
10Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages
11Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families
12Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies
13Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies - Test
hypotheses (e.g. Atkinson et al 2008 elbow
phenomenon)
14Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies - Test
hypotheses (e.g. Atkinson et al 2008 elbow
phenomenon) - Experimentally find the
best/optimal dating method
15Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies - Test
hypotheses (e.g. Atkinson et al 2008 elbow
phenomenon) - Experimentally find the
best/optimal dating method - Detect borrowings
16Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies - Test
hypotheses (e.g. Atkinson et al 2008 elbow
phenomenon) - Experimentally find the
best/optimal dating method - Detect borrowings
Today ...
17Overview
1. The basic list of lexical items
18Overview
1. The basic list of lexical items
2. Comparing languages
19Overview
1. The basic list of lexical items 2. Comparing
languages 3. Some results genetic and areal
proximity
20Overview
1. The basic list of lexical items 2. Comparing
languages 3. Some results genetic and areal
proximity 4. On Inheritance vs Borrowing
21Overview
1. The basic list of lexical items 2. Comparing
languages 3. Some results genetic and areal
proximity 4. On Inheritance vs Borrowing 5.
Conclusions
221. The basic list of lexical items
23Lexical items
Word list Swadesh 100 basic meanings
24Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages
25Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar
26Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed
27Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent
28Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time
29Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time - Few synonyms
301. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
311. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
321. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
331. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
341. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
351. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
361. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
37Lexical items further reduction
Early analyses have shown - Optimal 40/100 item
subset gives same results
38Lexical items further reduction
- Early analyses have shown
- - Optimal 40/100 item subset gives same results
- ? Less work
39Lexical items further reduction
- Early analyses have shown
- - Optimal 40/100 item subset gives same results
- ? Less work
- ? Less missing data
40Lexical items further reduction
- Early analyses have shown
- - Optimal 40/100 item subset gives same results
- ? Less work
- ? Less missing data
- Faster processing combinatorial explosion
- 40 100 3 107 2
1010
41Lexical items stability
Most stable items
42Lexical items stability
Most stable items Iteratively throw out the
most unstable item in terms of variation within
genera (3500-4000 years Dryer 2001 2005)
E.g. Germanic, Romance, Slavic,
43Lexical items stability
Most stable items Iteratively throw out the
most unstable item in terms of variation within
genera (3500-4000 years Dryer 2001 2005)
E.g. Germanic, Romance, Slavic, Formula S
(E - U)/(100 - U) (weighted average matches Eq
vs Uneq)
44Ethnologue (Goodmann-Kruskal)
WALS (Pearson)
lt Stability gt --
45 I dog nose die smoke
you louse mouth kill fire
we tree tooth swim ash
this seed tongue fly burn
that leaf claw walk path
who root foot come mountain
what bark knee lie red
not skin hand sit green
all flesh belly stand yellow
many blood neck give white
one bone breasts say black
two grease heart sun night
big egg liver moon hot
long horn drink star cold
small tail eat water full
woman feather bite rain new
man hair see stone good
person head hear sand round
fish ear know earth dry
bird eye sleep cloud name
46 I dog nose die smoke
you louse mouth kill fire
we tree tooth swim ash
this seed tongue fly burn
that leaf claw walk path
who root foot come mountain
what bark knee lie red
not skin hand sit green
all flesh belly stand yellow
many blood neck give white
one bone breast say black
two grease heart sun night
big egg liver moon hot
long horn drink star cold
small tail eat water full
woman feather bite rain new
man hair see stone good
person head hear sand round
fish ear know earth dry
bird eye sleep cloud name
40 Most Stable
47 I dog nose die smoke
you louse mouth kill fire
we tree tooth swim ash
this seed tongue fly burn
that leaf claw walk path
who root foot come mountain
what bark knee lie red
not skin hand sit green
all flesh belly stand yellow
many blood neck give white
one bone breast say black
two grease heart sun night
big egg liver moon hot
long horn drink star cold
small tail eat water full
woman feather bite rain new
man hair see stone good
person head hear sand round
fish ear know earth dry
bird eye sleep cloud name
H o m o p h o n e s
48Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words
49Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard
50Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal)
51Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal) ? Recoding to simplified
ASJPcode (only Ascii)
52Lexical items transcription
ASJPcode
53Lexical items transcription
ASJPcode 7 Vowels
54Lexical items transcription
ASJPcode 7 Vowels 34 Consonants
55Lexical items transcription
ASJPcode 7 Vowels 34 Consonants Operators
for Nasalization Labialization Palatalizati
on Aspiration Glottalization
56Lexical items transcription
ASJPcode 7 Vowels 34 Consonants Operators
for Nasalization Labialization Palatalizati
on Aspiration Glottalization ? (some)
complex syllables simplified (VXC ? VC)
57Abaza (Caucasian) Meaning PERSON LEAF SKI
N HORN NOSE TOOTH
58Abaza (Caucasian) Meaning IPA PERSON ????'??
???s LEAF b??? SKIN ??az? HORN ?'???
?a NOSE p?n?'a TOOTH p??
59Abaza (Caucasian) Meaning IPA ASJPcode PERSON
????'?????s Xw3Cw"yXw3s LEAF b??? bxy3 S
KIN ??az? Cwazy HORN ?'????a Cw"3Xwa NO
SE p?n?'a p3nc"a TOOTH p?? p3c
60Lexical items
Collected to date - Over 2100 languages,
dialects and proto
61Lexical items
- Collected to date
- - Over 2100 languages, dialects and proto
- - Mean number of items/language 36.2 (/40)
62Lexical items
Distribution Americas 27 Eurasia 23 Austral
ia/PNG 18 Austronesia 15 Africa 14 Creoles
2 Artificial 1
63Languages currently sampled
64Lexical items transcription
Second phase of project (2008) Problems with
full IPA representation solved
65Lexical items transcription
Second phase of project (2008) Problems with
full IPA representation solved 1. automatic
conversion IPA to integer (Python)
66Lexical items transcription
Second phase of project (2008) Problems with
full IPA representation solved 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
67Lexical items transcription
Abaza (Caucasian) Meaning PERSON
68Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s
69Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661 695 616 679 700 690 695 661
695 616 115
70Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661 695 616 679 700 690 695 661
695 616 115 ASJPcode 88 119 126 51 67 34 121
119 126 88 119 126 51 115 ( Xw3Cw"yXw3s)
71Lexical items transcription
Second phase of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
Why not run on full IPA??
72Lexical items transcription
Second phase of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
- correlations IPA ASJP gt 0.9
73Lexical items transcription
Second phase of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
- correlations IPA ASJP gt 0.9 - but ASJP
better fit with classifications ?
IPA too specific
74Lexical items transcription
IPA ????'?????s Decimal 661 695 616 679 700
690 695 661 695 616 115 ASJPcode (
any unicode string )
A ? n661, n695, n616, P Q ? A B C Z ? P Q Z
formal grammar
75Lexical items transcription
IPA ????'?????s Decimal 661 695 616 679 700
690 695 661 695 616 115 ASJPcode (
any unicode string )
optimal level of abstraction for
historical phonological reconstruction?
A ? n661, n695, n616, P Q ? A B C Z ? P Q Z
762. Comparing languages
77Comparing words
LG I YOU WE
ABAZA sErE w3rE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
78Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
LDi3
79Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
LDi3
LDj4
80Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
LDk3
LDi3
LDj4
81Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
LDk3
LDi3
LDj4
82Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
LDi3
LDj4
LDk3
LDmean3.73
83Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
LDi4
LDj4
LDk4
LDmean4.37
84Comparing words
3.73
LG I YOU WE
ABAZA sErE w3rE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
85Comparing words
3.73
LG I YOU WE
ABAZA sErE w3rE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
4.37
86Comparing words
Levenshtein Distance
87Comparing words
Levenshtein Distance a. between 2
words Number of transformations to get from the
shorter form to the longer one (changes,
additions)
88Comparing words
Levenshtein Distance a. between 2
words Number of transformations to get from the
shorter form to the longer one (changes,
additions) b. Between 2 languages E.g. mean LD
for overlapping set (lt 40)
89Comparing words
Levenshtein Distance Two problems with simple
LD
90Comparing words
- Levenshtein Distance
- Two problems
- Value depends on length of longest word
91Comparing words
- Levenshtein Distance
- Two problems
- Value depends on length of longest word
- ? Normalize LDN ( LD / Lmax )
92Comparing words
- Levenshtein Distance
- Two problems
- Value depends on length of longest word
- ? Normalize LDN ( LD / Lmax )
- 2. Differences between lgs in phonological overlap
93Comparing words
- Levenshtein Distance
- Two problems
- Value depends on length of longest word
- ? Normalize LDN ( LD / Lmax )
- 2. Differences between lgs in phonological
overlap - ? Eliminate noise LDND ( LDN / LDNdifferent )
94Comparing words
- Levenshtein Distance
- Two problems
- Value depends on length of longest word
- ? Normalize LDN 100 LDN
- 2. Differences between lgs in phonological
overlap - ? Eliminate noise LDND 100 LDND
95Comparing languages
- Levenshtein Distance for Language Pair
- Mean of all LDNDs of words in common
96Comparing languages
- Levenshtein Distance for Language Pair
- Mean of all LDNDs of words in common
- Synonyms (12)
- take Minimum pair
- take Mean
97Comparing languages
- Levenshtein Distance for Language Pair
- Mean of all LDNDs of words in common
- Synonyms (12)
- take Minimum pair
- take Mean
Experimental option
98Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
99Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
100Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
101Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
102Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
103Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
104Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"ayif LDND55.0
ALT AGL c"EyEr
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
105Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"ayif LDND55.0
ALT AGL c"EyEr
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
106Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"ayif LDND55.0
ALT AGL c"EyEr
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
107Comparing languages
1083. Some results genetic and areal proximity
109Distance Matrix (0.5 N (N-1))
FRE DUT GAL PRT ENG
FRE
DUT 90.93
GAL 71.62 90.00
PRT 74.38 94.61 51.87
ENG 91.17 63.19 91.30 95.18
lt Excel file gt
110Tools for Trees
111Tools for Trees
- Input file to your preferred phylogenetic
software using an editor such as TextPad
(www.textpad.com)
112Tools for Trees
- Input file to your preferred phylogenetic
software using an editor such as TextPad
(www.textpad.com) - Run data using phylogenetic software such as
SplitsTree (www.splitstree.org)
113Tools for Trees
- Input file to your preferred phylogenetic
software using an editor such as TextPad
(www.textpad.com) - Run data using phylogenetic software such as
SplitsTree (www.splitstree.org) - Choose the most appropriate algorithm (Neighbour
Joining for distance data)
114Tools for Trees
- Input file to your preferred phylogenetic
software using an editor such as TextPad
(www.textpad.com) - Run data using phylogenetic software such as
SplitsTree (www.splitstree.org) - Choose the most appropriate algorithm (Neighbour
Joining for distance data) - Prepare tree for presentation using using a tool
such as the Tree Explorer of MEGA
115Salishan Languages (n30)
116NeighborJoining
Salishan Languages (n30)
117UPGMA
NeighborJoining
118UPGMA
NeighborJoining
119NeighborJoining
NeighborJoining
120NeighborJoining
- NeighborJoining
- specifically meant for
- phylogenetic trees
121NeighborJoining
- NeighborJoining
- specifically meant for
- phylogenetic trees
- takes distance as point of
- departure
122NeighborJoining
- NeighborJoining
- specifically meant for
- phylogenetic trees
- takes distance as point of
- departure
- does NOT assume equal rate
- of change
123Mayan (n38)
124Calibration of Method
Calibration best options, parameters,
factors A. for pure classification
125Calibration of Method
Calibration best options, parameters,
factors A. for pure classification - existing
classifications (Ethnologue WALS mainly the
well-documented areas)
126Calibration of Method
- Calibration best options, parameters, factors
- A. for pure classification
- - existing classifications (Ethnologue WALS
- mainly the well-documented areas)
- - expert knowledge of specific areas
127Calibration of Method
- Calibration best options, parameters, factors
- A. for pure classification
- - existing classifications (Ethnologue WALS
- mainly the well-documented areas)
- - expert knowledge of specific areas
- ? diversion 12 ? niche!
128Calibration of Method
Calibration best options, parameters,
factors B. for dating
129Calibration of Method
Calibration best options, parameters,
factors B. for dating - linguistically
crucial historic events
130Linguistically crucial events
Date Historical event
Linguistic event
c. 250 Goths conquer Dacia split of E-W Romance
4th c Irish invade Scotland split of Irish-Scottish Gaelic
5th c German kingdoms in W Roman Empire breakup of W Romance
5th c Germans invade Britain split of English-Frisian
5th-6th c Britons flee to Brittany split of Welsh-Breton
400-600 Hieroglyphic evidence Ch'olan begins to split
768-814 Name of Charlemagne attested Proto-Slavic
131Linguistically crucial events
Date Historical event
Linguistic event
c. 250 Goths conquer Dacia split of E-W Romance
4th c Irish invade Scotland split of Irish-Scottish Gaelic
5th c German kingdoms in W Roman Empire breakup of W Romance
5th c Germans invade Britain split of English-Frisian
5th-6th c Britons flee to Brittany split of Welsh-Breton
400-600 Hieroglyphic evidence Ch'olan begins to split
768-814 Name of Charlemagne attested Proto-Slavic
132Linguistically crucial events
Date Historical event
Linguistic event
c. 250 Goths conquer Dacia split of E-W Romance
4th c Irish invade Scotland split of Irish-Scottish Gaelic
5th c German kingdoms in W Roman Empire breakup of W Romance
5th c Germans invade Britain split of English-Frisian
5th-6th c Britons flee to Brittany split of Welsh-Breton
400-600 Hieroglyphic evidence Ch'olan begins to split
768-814 Name of Charlemagne attested Proto-Slavic
133Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- ? Standard formula (Swadesh)
- TimeDepth log(Similarity) / 2 log
Retention
134Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- ? Standard formula
- TimeDepth log(Similarity) / 2 log
Retention
135Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- ? Standard formula
- TimeDepth log(LDND) / 2 log Retention
136Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- ? Standard formula
- TimeDepth log(LDND) / 2 log Retention
137Linguistically crucial events
Time linguistic event LDND Ret
1.75 split of E-W Romance 0.6753 0.73
1.65 split of Irish-Scottish Gaelic 0.6687 0.72
1.55 breakup of W Romance 0.6411 0.72
1.55 split of English-Frisian 0.6574 0.71
1.50 split of Welsh-Breton 0.5705 0.75
1.40 Ch'olan begins to split 0.5369 0.76
1.21 Proto-Slavic 0.5877 0.69
MEAN 0.73
138Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- - Standard formula
- TimeDepth log(LDND) / 2 log 73
139Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- - Standard formula
- TimeDepth log(LDND) / 2 log 73 lt 75
140Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- - Standard formula
- TimeDepth log(LDND) / 2 log 73 lt 75
Deeper!
141Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance
142Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance Add other linguistic
domains
143Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance Add other linguistic
domains WALS Typological
database
144Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance Add other linguistic
domains WALS Typological
database Best result (75 40 lex) (25 40
Ph/M/S features)
1454. On Inheritance vs Borrowing
146Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
147Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0
148Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0
149Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0 ? Genetically related !!
150Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
151Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2
152Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 ? 6 items lt 70.0
153Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 ? 6 items lt 70.0
RELATED ???
154Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 ? RELATED ???
NO!!!
155Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 INDO-EUROPEAN lt gt
AUSTRONESIAN
156Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 CHANCE?
157Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 CHANCE? ? 5
(i.e. 1 2 items)
158Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 BORROWING through
LANGUAGE CONTACT
159Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9
160Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA
161Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA fam/gen
0.24/0.82
162Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA fam/gen
0.24/0.82 gt 0.03/0.00
163Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA fam/gen
0.24/0.82 gt 0.03/0.00
phon pattern fit 12.00 gt 0.67
164Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA fam/gen
0.24/0.82 gt 0.03/0.00
phon pattern fit 12.00 gt 0.67
165Borrowed!
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA gt CHA fam/gen
0.24/0.82 gt 0.03/0.00
phon pattern fit 12.00 gt 0.67
166Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROTWO dosdos LDND
0.0 SPA gt CHA f/g 0.62/1.00 gt
0.12/0.00 swF 100.00
gt 0.22
167Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND55.3 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 swF 32.40 gt 0.13
168Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND55.3 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 swF 32.40 gt 0.13 ALT CHA
taotao (0.41/0.00)
169Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND55.3 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 swF 32.40 gt 0.13 ALT CHA
taotao (0.41/0.00)
170Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND61.2 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 swF 100.00 gt 4.44
ALT CHA puti7on (0.03/0.00)
171Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONIGHT noCenoces
LDND68.2 SPA gt CHA f/g 0.23/0.55 gt
0.04/0.00 swF 100.00 gt 0.10
ALT CHA pweNi (0.23/0.00)
172Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONEW nuevonueba
LDND44.2 SPA gt CHA f/g 0.50/0.64 gt
0.04/0.00 swF 4.27 gt 0.03
1735. Conclusions
174Conclusions
- Method for automatic reconstruction of language
relationships
175Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications
176Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time
177Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate potential
borrowings
178Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate potential
borrowings - C O R E incremental lexical
database (gt 35)
179Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate potential
borrowings - C O R E incremental lexical
database (gt 35) ? One day Online
180Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate potential
borrowings - C O R E incremental lexical
database (gt 35) ? One day Online ? Cooperation
!!
181Holman et al. (forthc. 2008) Explorations in
automated language classification. Folia
Linguistica Brown et al. (forthc. 2008)
Automated Classification of the Worlds
languages A description of the method and
prelimary results Sprachtypologie und
Universalienforschung Several working
papers email.eva.mpg.de./wichmann/ASJPHomePage
182?