Advances in Automated Language Classification ASJP Consortium Dik Bakker, Lancaster - PowerPoint PPT Presentation

1 / 182
About This Presentation
Title:

Advances in Automated Language Classification ASJP Consortium Dik Bakker, Lancaster

Description:

Distance matrix between individual languages on basis of. linguistic features ... Lexicostatistics: mass comparison of lexical items. ASJP: Automatic Reconstruction ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 183
Provided by: bak47
Category:

less

Transcript and Presenter's Notes

Title: Advances in Automated Language Classification ASJP Consortium Dik Bakker, Lancaster


1
Advances inAutomatedLanguageClassificationASJ
P ConsortiumDik Bakker, Lancaster
2
Overview
Project ASJP (Automated Similarity Judgment
Program)
3
Overview
Project ASJP are Sören Wichmann (BRD
Netherlands) Viveka Velupillai (BRD) André
Müller (BRD) Robert Mailhammer (BRD) Hagen
Jung (BRD) Eric Holman (US) Anthony Grant
(UK) Dmitry Egorov (Russia) Pamela Brown
(US) Cecil Brown (US) Dik Bakker (UK
Netherlands)
4
Overview
Project ASJP (Automated Similarity Judgment
Program)
5
Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships
6
Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance matrix
between individual languages on basis of
linguistic features
7
Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance matrix
between individual languages on basis of
linguistic features Method Lexicostatistics
mass comparison of lexical items
8
Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals (a.o)
9
Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications
10
Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages
11
Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families
12
Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies
13
Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies - Test
hypotheses (e.g. Atkinson et al 2008 elbow
phenomenon)
14
Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies - Test
hypotheses (e.g. Atkinson et al 2008 elbow
phenomenon) - Experimentally find the
best/optimal dating method
15
Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies - Test
hypotheses (e.g. Atkinson et al 2008 elbow
phenomenon) - Experimentally find the
best/optimal dating method - Detect borrowings
16
Overview
MAIN GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Estimate time depths
between languages / genera / families - Search
for (ir)regularities in phylogenies - Test
hypotheses (e.g. Atkinson et al 2008 elbow
phenomenon) - Experimentally find the
best/optimal dating method - Detect borrowings
Today ...
17
Overview
1. The basic list of lexical items

18
Overview
1. The basic list of lexical items
2. Comparing languages
19
Overview
1. The basic list of lexical items 2. Comparing
languages 3. Some results genetic and areal
proximity
20
Overview
1. The basic list of lexical items 2. Comparing
languages 3. Some results genetic and areal
proximity 4. On Inheritance vs Borrowing
21
Overview
1. The basic list of lexical items 2. Comparing
languages 3. Some results genetic and areal
proximity 4. On Inheritance vs Borrowing 5.
Conclusions
22
1. The basic list of lexical items
23
Lexical items
Word list Swadesh 100 basic meanings

24
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages

25
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar

26
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed

27
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent

28
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time

29
Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time - Few synonyms

30
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
31
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
32
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
33
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
34
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
35
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
36
1. I 21. dog 41. nose 61. die 81. smoke
2. you 22. louse 42. mouth 62. kill 82. fire
3. we 23. tree 43. tooth 63. swim 83. ash
4. this 24. seed 44. tongue 64. fly 84. burn
5. that 25. leaf 45. claw 65. walk 85. path
6. who 26. root 46. foot 66. come 86. mountain
7. what 27. bark 47. knee 67. lie 87. red
8. not 28. skin 48. hand 68. sit 88. green
9. all 29. flesh 49. belly 69. stand 89. yellow
10. many 30. blood 50. neck 70. give 90. white
11. one 31. bone 51. breasts 71. say 91. black
12. two 32. grease 52. heart 72. sun 92. night
13. big 33. egg 53. liver 73. moon 93. hot
14. long 34. horn 54. drink 74. star 94. cold
15. small 35. tail 55. eat 75. water 95. full
16. woman 36. feather 56. bite 76. rain 96. new
17. man 37. hair 57. see 77. stone 97. good
18. person 38. head 58. hear 78. sand 98. round
19. fish 39. ear 59. know 79. earth 99. dry
20. bird 40. eye 60. sleep 80. cloud 100. name
37
Lexical items further reduction
Early analyses have shown - Optimal 40/100 item
subset gives same results

38
Lexical items further reduction
  • Early analyses have shown
  • - Optimal 40/100 item subset gives same results
  • ? Less work


39
Lexical items further reduction
  • Early analyses have shown
  • - Optimal 40/100 item subset gives same results
  • ? Less work
  • ? Less missing data


40
Lexical items further reduction
  • Early analyses have shown
  • - Optimal 40/100 item subset gives same results
  • ? Less work
  • ? Less missing data
  • Faster processing combinatorial explosion
  • 40 100 3 107 2
    1010


41
Lexical items stability
Most stable items

42
Lexical items stability
Most stable items Iteratively throw out the
most unstable item in terms of variation within
genera (3500-4000 years Dryer 2001 2005)
E.g. Germanic, Romance, Slavic,

43
Lexical items stability
Most stable items Iteratively throw out the
most unstable item in terms of variation within
genera (3500-4000 years Dryer 2001 2005)
E.g. Germanic, Romance, Slavic, Formula S
(E - U)/(100 - U) (weighted average matches Eq
vs Uneq)

44
Ethnologue (Goodmann-Kruskal)
WALS (Pearson)
lt Stability gt --
45
I dog nose die smoke
you louse mouth kill fire
we tree tooth swim ash
this seed tongue fly burn
that leaf claw walk path
who root foot come mountain
what bark knee lie red
not skin hand sit green
all flesh belly stand yellow
many blood neck give white
one bone breasts say black
two grease heart sun night
big egg liver moon hot
long horn drink star cold
small tail eat water full
woman feather bite rain new
man hair see stone good
person head hear sand round
fish ear know earth dry
bird eye sleep cloud name
46
I dog nose die smoke
you louse mouth kill fire
we tree tooth swim ash
this seed tongue fly burn
that leaf claw walk path
who root foot come mountain
what bark knee lie red
not skin hand sit green
all flesh belly stand yellow
many blood neck give white
one bone breast say black
two grease heart sun night
big egg liver moon hot
long horn drink star cold
small tail eat water full
woman feather bite rain new
man hair see stone good
person head hear sand round
fish ear know earth dry
bird eye sleep cloud name
40 Most Stable
47
I dog nose die smoke
you louse mouth kill fire
we tree tooth swim ash
this seed tongue fly burn
that leaf claw walk path
who root foot come mountain
what bark knee lie red
not skin hand sit green
all flesh belly stand yellow
many blood neck give white
one bone breast say black
two grease heart sun night
big egg liver moon hot
long horn drink star cold
small tail eat water full
woman feather bite rain new
man hair see stone good
person head hear sand round
fish ear know earth dry
bird eye sleep cloud name
H o m o p h o n e s
48
Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words

49
Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard

50
Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal)

51
Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal) ? Recoding to simplified
ASJPcode (only Ascii)

52
Lexical items transcription
ASJPcode

53
Lexical items transcription
ASJPcode 7 Vowels

54
Lexical items transcription
ASJPcode 7 Vowels 34 Consonants

55
Lexical items transcription
ASJPcode 7 Vowels 34 Consonants Operators
for Nasalization Labialization Palatalizati
on Aspiration Glottalization

56
Lexical items transcription
ASJPcode 7 Vowels 34 Consonants Operators
for Nasalization Labialization Palatalizati
on Aspiration Glottalization ? (some)
complex syllables simplified (VXC ? VC)

57
Abaza (Caucasian) Meaning PERSON LEAF SKI
N HORN NOSE TOOTH
58
Abaza (Caucasian) Meaning IPA PERSON ????'??
???s LEAF b??? SKIN ??az? HORN ?'???
?a NOSE p?n?'a TOOTH p??
59
Abaza (Caucasian) Meaning IPA ASJPcode PERSON
????'?????s Xw3Cw"yXw3s LEAF b??? bxy3 S
KIN ??az? Cwazy HORN ?'????a Cw"3Xwa NO
SE p?n?'a p3nc"a TOOTH p?? p3c
60
Lexical items
Collected to date - Over 2100 languages,
dialects and proto

61
Lexical items
  • Collected to date
  • - Over 2100 languages, dialects and proto
  • - Mean number of items/language 36.2 (/40)


62
Lexical items
Distribution Americas 27 Eurasia 23 Austral
ia/PNG 18 Austronesia 15 Africa 14 Creoles
2 Artificial 1

63
Languages currently sampled
64
Lexical items transcription
Second phase of project (2008) Problems with
full IPA representation solved

65
Lexical items transcription
Second phase of project (2008) Problems with
full IPA representation solved 1. automatic
conversion IPA to integer (Python)

66
Lexical items transcription
Second phase of project (2008) Problems with
full IPA representation solved 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar

67
Lexical items transcription
Abaza (Caucasian) Meaning PERSON

68
Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s

69
Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661 695 616 679 700 690 695 661
695 616 115

70
Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661 695 616 679 700 690 695 661
695 616 115 ASJPcode 88 119 126 51 67 34 121
119 126 88 119 126 51 115 ( Xw3Cw"yXw3s)

71
Lexical items transcription
Second phase of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
Why not run on full IPA??

72
Lexical items transcription
Second phase of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
- correlations IPA ASJP gt 0.9

73
Lexical items transcription
Second phase of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
- correlations IPA ASJP gt 0.9 - but ASJP
better fit with classifications ?
IPA too specific

74
Lexical items transcription
IPA ????'?????s Decimal 661 695 616 679 700
690 695 661 695 616 115 ASJPcode (
any unicode string )
A ? n661, n695, n616, P Q ? A B C Z ? P Q Z
formal grammar

75
Lexical items transcription
IPA ????'?????s Decimal 661 695 616 679 700
690 695 661 695 616 115 ASJPcode (
any unicode string )
optimal level of abstraction for
historical phonological reconstruction?
A ? n661, n695, n616, P Q ? A B C Z ? P Q Z

76
2. Comparing languages
77
Comparing words
LG I YOU WE
ABAZA sErE w3rE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun

78
Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun

LDi3
79
Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun

LDi3
LDj4
80
Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun

LDk3
LDi3
LDj4
81
Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun


LDk3
LDi3
LDj4
82
Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun


LDi3
LDj4
LDk3
LDmean3.73
83
Comparing words
LG I YOU WE
ABAZA sErE bErE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun


LDi4
LDj4
LDk4
LDmean4.37
84
Comparing words
3.73
LG I YOU WE
ABAZA sErE w3rE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun

85
Comparing words
3.73
LG I YOU WE
ABAZA sErE w3rE SwErE
ABKHAZ s3 w3 Sw3
AGUL zun wun cwun
4.37

86
Comparing words
Levenshtein Distance

87
Comparing words
Levenshtein Distance a. between 2
words Number of transformations to get from the
shorter form to the longer one (changes,
additions)

88
Comparing words
Levenshtein Distance a. between 2
words Number of transformations to get from the
shorter form to the longer one (changes,
additions) b. Between 2 languages E.g. mean LD
for overlapping set (lt 40)

89
Comparing words
Levenshtein Distance Two problems with simple
LD

90
Comparing words
  • Levenshtein Distance
  • Two problems
  • Value depends on length of longest word


91
Comparing words
  • Levenshtein Distance
  • Two problems
  • Value depends on length of longest word
  • ? Normalize LDN ( LD / Lmax )


92
Comparing words
  • Levenshtein Distance
  • Two problems
  • Value depends on length of longest word
  • ? Normalize LDN ( LD / Lmax )
  • 2. Differences between lgs in phonological overlap


93
Comparing words
  • Levenshtein Distance
  • Two problems
  • Value depends on length of longest word
  • ? Normalize LDN ( LD / Lmax )
  • 2. Differences between lgs in phonological
    overlap
  • ? Eliminate noise LDND ( LDN / LDNdifferent )


94
Comparing words
  • Levenshtein Distance
  • Two problems
  • Value depends on length of longest word
  • ? Normalize LDN 100 LDN
  • 2. Differences between lgs in phonological
    overlap
  • ? Eliminate noise LDND 100 LDND


95
Comparing languages
  • Levenshtein Distance for Language Pair
  • Mean of all LDNDs of words in common


96
Comparing languages
  • Levenshtein Distance for Language Pair
  • Mean of all LDNDs of words in common
  • Synonyms (12)
  • take Minimum pair
  • take Mean


97
Comparing languages
  • Levenshtein Distance for Language Pair
  • Mean of all LDNDs of words in common
  • Synonyms (12)
  • take Minimum pair
  • take Mean

Experimental option

98
Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87

99
Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87

100
Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87

101
Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87

102
Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87

103
Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87

104
Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"ayif LDND55.0
ALT AGL c"EyEr
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87

105
Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"ayif LDND55.0
ALT AGL c"EyEr
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87

106
Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"ayif LDND55.0
ALT AGL c"EyEr
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87

107
Comparing languages

108
3. Some results genetic and areal proximity
109
Distance Matrix (0.5 N (N-1))
FRE DUT GAL PRT ENG
FRE
DUT 90.93
GAL 71.62 90.00
PRT 74.38 94.61 51.87
ENG 91.17 63.19 91.30 95.18

lt Excel file gt
110
Tools for Trees

111
Tools for Trees
  • Input file to your preferred phylogenetic
    software using an editor such as TextPad
    (www.textpad.com)


112
Tools for Trees
  • Input file to your preferred phylogenetic
    software using an editor such as TextPad
    (www.textpad.com)
  • Run data using phylogenetic software such as
    SplitsTree (www.splitstree.org)


113
Tools for Trees
  • Input file to your preferred phylogenetic
    software using an editor such as TextPad
    (www.textpad.com)
  • Run data using phylogenetic software such as
    SplitsTree (www.splitstree.org)
  • Choose the most appropriate algorithm (Neighbour
    Joining for distance data)


114
Tools for Trees
  • Input file to your preferred phylogenetic
    software using an editor such as TextPad
    (www.textpad.com)
  • Run data using phylogenetic software such as
    SplitsTree (www.splitstree.org)
  • Choose the most appropriate algorithm (Neighbour
    Joining for distance data)
  • Prepare tree for presentation using using a tool
    such as the Tree Explorer of MEGA


115
Salishan Languages (n30)
116
NeighborJoining
Salishan Languages (n30)
117
UPGMA
NeighborJoining
118
UPGMA
NeighborJoining
119
NeighborJoining
NeighborJoining
120
NeighborJoining
  • NeighborJoining
  • specifically meant for
  • phylogenetic trees

121
NeighborJoining
  • NeighborJoining
  • specifically meant for
  • phylogenetic trees
  • takes distance as point of
  • departure

122
NeighborJoining
  • NeighborJoining
  • specifically meant for
  • phylogenetic trees
  • takes distance as point of
  • departure
  • does NOT assume equal rate
  • of change

123
Mayan (n38)
124
Calibration of Method
Calibration best options, parameters,
factors A. for pure classification

125
Calibration of Method
Calibration best options, parameters,
factors A. for pure classification - existing
classifications (Ethnologue WALS mainly the
well-documented areas)

126
Calibration of Method
  • Calibration best options, parameters, factors
  • A. for pure classification
  • - existing classifications (Ethnologue WALS
  • mainly the well-documented areas)
  • - expert knowledge of specific areas


127
Calibration of Method
  • Calibration best options, parameters, factors
  • A. for pure classification
  • - existing classifications (Ethnologue WALS
  • mainly the well-documented areas)
  • - expert knowledge of specific areas
  • ? diversion 12 ? niche!


128
Calibration of Method
Calibration best options, parameters,
factors B. for dating

129
Calibration of Method
Calibration best options, parameters,
factors B. for dating - linguistically
crucial historic events

130
Linguistically crucial events
Date Historical event
Linguistic event
c. 250 Goths conquer Dacia split of E-W Romance
4th c Irish invade Scotland split of Irish-Scottish Gaelic
5th c German kingdoms in W Roman Empire breakup of W Romance
5th c Germans invade Britain split of English-Frisian
5th-6th c Britons flee to Brittany split of Welsh-Breton
400-600 Hieroglyphic evidence Ch'olan begins to split
768-814 Name of Charlemagne attested Proto-Slavic

131
Linguistically crucial events
Date Historical event
Linguistic event
c. 250 Goths conquer Dacia split of E-W Romance
4th c Irish invade Scotland split of Irish-Scottish Gaelic
5th c German kingdoms in W Roman Empire breakup of W Romance
5th c Germans invade Britain split of English-Frisian
5th-6th c Britons flee to Brittany split of Welsh-Breton
400-600 Hieroglyphic evidence Ch'olan begins to split
768-814 Name of Charlemagne attested Proto-Slavic

132
Linguistically crucial events
Date Historical event
Linguistic event
c. 250 Goths conquer Dacia split of E-W Romance
4th c Irish invade Scotland split of Irish-Scottish Gaelic
5th c German kingdoms in W Roman Empire breakup of W Romance
5th c Germans invade Britain split of English-Frisian
5th-6th c Britons flee to Brittany split of Welsh-Breton
400-600 Hieroglyphic evidence Ch'olan begins to split
768-814 Name of Charlemagne attested Proto-Slavic

133
Calibration of Method
  • Calibration best options, parameters, factors
  • B. for dating
  • - linguistically crucial historic events
  • ? Standard formula (Swadesh)
  • TimeDepth log(Similarity) / 2 log
    Retention


134
Calibration of Method
  • Calibration best options, parameters, factors
  • B. for dating
  • - linguistically crucial historic events
  • ? Standard formula
  • TimeDepth log(Similarity) / 2 log
    Retention


135
Calibration of Method
  • Calibration best options, parameters, factors
  • B. for dating
  • - linguistically crucial historic events
  • ? Standard formula
  • TimeDepth log(LDND) / 2 log Retention


136
Calibration of Method
  • Calibration best options, parameters, factors
  • B. for dating
  • - linguistically crucial historic events
  • ? Standard formula
  • TimeDepth log(LDND) / 2 log Retention


137
Linguistically crucial events
Time linguistic event LDND Ret
1.75 split of E-W Romance 0.6753 0.73
1.65 split of Irish-Scottish Gaelic 0.6687 0.72
1.55 breakup of W Romance 0.6411 0.72
1.55 split of English-Frisian 0.6574 0.71
1.50 split of Welsh-Breton 0.5705 0.75
1.40 Ch'olan begins to split 0.5369 0.76
1.21 Proto-Slavic 0.5877 0.69
MEAN 0.73

138
Calibration of Method
  • Calibration best options, parameters, factors
  • B. for dating
  • - linguistically crucial historic events
  • - Standard formula
  • TimeDepth log(LDND) / 2 log 73


139
Calibration of Method
  • Calibration best options, parameters, factors
  • B. for dating
  • - linguistically crucial historic events
  • - Standard formula
  • TimeDepth log(LDND) / 2 log 73 lt 75


140
Calibration of Method
  • Calibration best options, parameters, factors
  • B. for dating
  • - linguistically crucial historic events
  • - Standard formula
  • TimeDepth log(LDND) / 2 log 73 lt 75


Deeper!
141
Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance

142
Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance Add other linguistic
domains

143
Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance Add other linguistic
domains WALS Typological
database

144
Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance Add other linguistic
domains WALS Typological
database Best result (75 40 lex) (25 40
Ph/M/S features)

145
4. On Inheritance vs Borrowing
146
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)

147
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0

148
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0

149
Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0 ? Genetically related !!

150
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)

151
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2

152
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 ? 6 items lt 70.0

153
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 ? 6 items lt 70.0
RELATED ???

154
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 ? RELATED ???
NO!!!

155
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 INDO-EUROPEAN lt gt
AUSTRONESIAN

156
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 CHANCE?

157
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 CHANCE? ? 5
(i.e. 1 2 items)

158
Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 BORROWING through
LANGUAGE CONTACT

159
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9

160
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA

161
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA fam/gen
0.24/0.82

162
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA fam/gen
0.24/0.82 gt 0.03/0.00

163
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA fam/gen
0.24/0.82 gt 0.03/0.00
phon pattern fit 12.00 gt 0.67

164
Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA ltgt CHA fam/gen
0.24/0.82 gt 0.03/0.00
phon pattern fit 12.00 gt 0.67


165
Borrowed!
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA gt CHA fam/gen
0.24/0.82 gt 0.03/0.00
phon pattern fit 12.00 gt 0.67


166
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROTWO dosdos LDND
0.0 SPA gt CHA f/g 0.62/1.00 gt
0.12/0.00 swF 100.00
gt 0.22


167
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND55.3 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 swF 32.40 gt 0.13

168
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND55.3 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 swF 32.40 gt 0.13 ALT CHA
taotao (0.41/0.00)

169
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND55.3 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 swF 32.40 gt 0.13 ALT CHA
taotao (0.41/0.00)

170
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND61.2 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 swF 100.00 gt 4.44

ALT CHA puti7on (0.03/0.00)

171
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONIGHT noCenoces
LDND68.2 SPA gt CHA f/g 0.23/0.55 gt
0.04/0.00 swF 100.00 gt 0.10

ALT CHA pweNi (0.23/0.00)

172
Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONEW nuevonueba
LDND44.2 SPA gt CHA f/g 0.50/0.64 gt
0.04/0.00 swF 4.27 gt 0.03

173
5. Conclusions
174
Conclusions
- Method for automatic reconstruction of language
relationships

175
Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications

176
Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time

177
Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate potential
borrowings

178
Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate potential
borrowings - C O R E incremental lexical
database (gt 35)

179
Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate potential
borrowings - C O R E incremental lexical
database (gt 35) ? One day Online

180
Conclusions
- Method for automatic reconstruction of language
relationships - Assess, discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate potential
borrowings - C O R E incremental lexical
database (gt 35) ? One day Online ? Cooperation
!!
181
Holman et al. (forthc. 2008) Explorations in
automated language classification. Folia
Linguistica Brown et al. (forthc. 2008)
Automated Classification of the Worlds
languages A description of the method and
prelimary results Sprachtypologie und
Universalienforschung Several working
papers email.eva.mpg.de./wichmann/ASJPHomePage
182
?
Write a Comment
User Comments (0)
About PowerShow.com