Title: Advances in Automated Language Classification ASJP Consortium Dik Bakker, Lancaster
1Advances inAutomatedLanguageClassificationASJ
P ConsortiumDik Bakker, Lancaster
2Overview
Project ASJP (Automated Similarity Judgment
Program)
3Overview
Project ASJP (Automated Similarity Judgment
Program)
NUMBERS
LANGUAGE
4Overview
Project ASJP (Automated Similarity Judgment
Program)
5Overview
Project ASJP are Sören Wichmann (BRD
Netherlands) Viveka Velupillai (BRD) André
Müller (BRD) Robert Mailhammer (BRD) Hagen
Jung (BRD) Eric Holman (US) Anthony Grant
(UK) Dmitry Egorov (Russia) Pamela Brown
(US) Cecil Brown (US) Dik Bakker (UK
Netherlands)
6Overview
Project ASJP (Automated Similarity Judgment
Program)
7Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships
8Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance matrix
between individual languages on basis of
linguistic features
9Overview
Project ASJP (Automated Similarity Judgment
Program) Overall goal Automatic reconstruction
of language relationships Basis Distance matrix
between individual languages on basis of
linguistic features Method Lexicostatistics
mass comparison of basic lexical items,
extended by typological data
10Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals (a.o)
11Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications
12Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages
13Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies
14Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon)
15Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon) -
Experimentally find an optimal dating method
16Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon) -
Experimentally find an optimal dating method -
Automatically detect borrowings
17Overview
OVERALL GOAL Reconstruction of Language
Relationships Derived goals - Critical
assessment and refinement of existing
classifications - Classify newly described and
unclassified languages - Search for
(ir)regularities in phylogenies - Test hypotheses
(e.g. Atkinson et al 2008 elbow phenomenon) -
Experimentally find the best/optimal dating
method - Automatically detect borrowings
Today ...
18Overview
1. The list of basic lexical items
19Overview
1. The list of basic lexical items
2. Comparing languages
20Overview
1. The list of basic lexical items 2. Comparing
languages 3. Some results genetic and areal
proximity
21Overview
1. The list of basic lexical items 2. Comparing
languages 3. Some results genetic and areal
proximity 4. On Inheritance vs Borrowing
22Overview
1. The list of basic lexical items 2. Comparing
languages 3. Some results genetic and areal
proximity 4. On Inheritance vs Borrowing 5.
Conclusions
231. The list of basic lexical items
24Lexical items
Word list Swadesh 100 basic meanings
25Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages
26Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar
27Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed
28Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent
29Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time
30Lexical items
Word list Swadesh 100 basic meanings - Word
coined in most languages - Collected in field
work lexicon / grammar - Inherited rather than
borrowed - Culturally independent - Stable over
time - Few synonyms
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38Lexical items further reduction
Early analyses have shown - Optimal 40/100 item
subset gives same results
39Lexical items further reduction
- Early analyses have shown
- - Optimal 40/100 item subset gives same results
- ? Less work
40Lexical items further reduction
- Early analyses have shown
- - Optimal 40/100 item subset gives same results
- ? Less work
- ? Less missing data
41Lexical items further reduction
- Early analyses have shown
- - Optimal 40/100 item subset gives same results
- ? Less work
- ? Less missing data
- Faster processing combinatorial explosion
- 40 100 3 107 2
1010
42Lexical items stability
Determine most stable items
43Lexical items stability
Determine most stable items Iteratively throw
out the most unstable item in terms of variation
within genera (3500-4000 years Dryer 2001
2005) E.g. Germanic, Romance, , Mayan, ,
Sino-T
44Lexical items stability
Determine most stable items Iteratively throw
out the most unstable item in terms of variation
within genera (3500-4000 years Dryer 2001
2005) E.g. Germanic, Romance, , Mayan, ,
Sino-T Formula S (E - U)/(100 - U) (weighted
average matches Eq vs Uneq)
45Ethnologue (Goodmann-Kruskal)
WALS (Pearson)
lt Stability gt --
46(No Transcript)
4740 Most Stable
4840 Most Stable
Home page
49Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words
50Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard
51Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal)
52Lexical items transcription
First phase of project (2007) Problems with
full IPA representation of words - data entry
via keyboard - simple programming language
(Fortran Pascal) ? Recoding to simplified
ASJPcode (only Ascii)
53Lexical items transcription
ASJPcode
54Lexical items transcription
ASJPcode 7 Vowels
55Lexical items transcription
ASJPcode 7 Vowels 34 Consonants
56Lexical items transcription
ASJPcode 7 Vowels 34 Consonants
Closest sound
57Lexical items transcription
ASJPcode 7 Vowels 34 Consonants Operators
for Nasalization Labialization Palatalizati
on Aspiration Glottalization
58Abaza (Caucasian) Meaning PERSON LEAF SKI
N HORN NOSE TOOTH
59Abaza (Caucasian) Meaning IPA PERSON ????'??
???s LEAF b??? SKIN ??az? HORN ?'???
?a NOSE p?n?'a TOOTH p??
60Abaza (Caucasian) Meaning IPA ASJPcode PERSON
????'?????s Xw3Cw"yXw3s LEAF b??? bxy3 S
KIN ??az? Cwazy HORN ?'????a Cw"3Xwa NO
SE p?n?'a p3nc"a TOOTH p?? p3c
61Lexical items
Collected to date - Over 2100 languages,
dialects and proto
62Lexical items
- Collected to date
- - Over 2100 languages, dialects and proto
- - Mean number of items/language 36.2 (/40)
63Lexical items
Areal distribution (not a sample!) Americas 27
Eurasia 23 Australia/PNG 18 Austronesia 15
Africa 14 Creoles 2 Artificial 1
64Languages currently sampled
65Lexical items transcription
Second phase of project (2008) Problems with
full IPA representation solved
66Lexical items transcription
Second phase of project (2008) Problems with
full IPA representation solved 1. automatic
conversion IPA to integer (Python)
67Lexical items transcription
Second phase of project (2008) Problems with
full IPA representation solved 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
68Lexical items transcription
Abaza (Caucasian) Meaning PERSON
69Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s
70Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661 695 616 679 700 690 695 661
695 616 115
71Lexical items transcription
Abaza (Caucasian) Meaning PERSON IPA ????'?
????s Decimal 661 695 616 679 700 690 695 661
695 616 115 ASJPcode 88 119 126 51 67 34 121
119 126 88 119 126 51 115 ( Xw3Cw"yXw3s)
72Lexical items transcription
Second phase of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
Why not run on full IPA??
73Lexical items transcription
Second phase of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
- correlations IPA ASJP gt 0.9
74Lexical items transcription
Second phase of project (2008) 1. automatic
conversion IPA to integer (Python) 2.
(semi-)automatic recoding to ASJPcode
transduction on the basis of a formal grammar
- correlations IPA ASJP gt 0.9 - but ASJP
better fit with classifications ?
IPA too specific
75Lexical items transcription
IPA ????'?????s Decimal 661 695 616 679 700
690 695 661 695 616 115 ASJPcode (
any unicode subset )
A ? n661, n695, n616, P Q ? A B C Z ? P Q Z
formal grammar
76Lexical items transcription
IPA ????'?????s Decimal 661 695 616 679 700
690 695 661 695 616 115 ASJPcode (
any unicode subset )
optimal level of abstraction for
historical phonological reconstruction?
A ? n661, n695, n616, P Q ? A B C Z ? P Q Z
772. Comparing languages
78Comparing words
79Comparing words
LDi3
80Comparing words
LDi3
LDj4
81Comparing words
LDk3
LDi3
LDj4
82Comparing words
LDk3
LDi3
LDj4
83Comparing words
LDi3
LDj4
LDk3
LDmean3.73
84Comparing words
LDi4
LDj4
LDk4
LDmean4.37
85Comparing words
3.73
86Comparing words
3.73
4.37
87Comparing words
Levenshtein Distance
88Comparing words
Levenshtein Distance a. between 2
words Number of transformations to get from the
shorter form to the longer one (changes,
additions)
89Comparing words
Levenshtein Distance a. between 2
words Number of transformations to get from the
shorter form to the longer one (changes,
additions) b. Between 2 languages E.g. mean LD
for overlapping set (lt 40)
90Comparing words
Levenshtein Distance Two problems with simple
LD
91Comparing words
- Levenshtein Distance
- Two problems
- Value depends on length of longest word
92Comparing words
- Levenshtein Distance
- Two problems
- Value depends on length of longest word
- ? Normalize LDN ( LD / Lmax )
93Comparing words
- Levenshtein Distance
- Two problems
- Value depends on length of longest word
- ? Normalize LDN ( LD / Lmax )
- 2. Differences between lgs in phonological overlap
94Comparing words
- Levenshtein Distance
- Two problems
- Value depends on length of longest word
- ? Normalize LDN ( LD / Lmax )
- 2. Differences between lgs in phonological
overlap - ? Eliminate noise LDND ( LDN / LDNdifferent )
95Comparing languages
- Levenshtein Distance for Language Pair
- Mean of all LDNDs of words in common
96Comparing languages
- Levenshtein Distance for Language Pair
- Mean of all LDNDs of words in common
- Synonyms (12)
- take Minimum pair
- take Mean
97Comparing languages
- Levenshtein Distance for Language Pair
- Mean of all LDNDs of words in common
- Synonyms (12)
- take Minimum pair
- take Mean
Experimental option
98Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
99Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
100Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
101Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
102Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
103Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"EyEr LDND55.0
ALT AGL c"ayif
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
104Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"ayif LDND55.0
ALT AGL c"EyEr
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
105Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"ayif LDND55.0
ALT AGL c"EyEr
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
106Comparing languages
AVAR (AVA NAKH-DAGHESTANIAN gt AVAR-ANDIC-TSEZIC)
/ AGUL (AGL NAKH-DAGHESTANIAN gt
LEZGIC) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0
ALT AGL ac"ar
NEW c"iyac"ayif LDND55.0
ALT AGL c"EyEr
COMMON (LDND lt 70) AGL - AVA 6 (15.8 of
38) LD 4.01 / LDN 81.76 / LDND 89.87
107Comparing languages
1083. Some results genetic and areal proximity
109Distance Matrix (0.5 N (N-1))
lt Excel file gt
110Tools for Trees
111Tools for Trees
- Run data using phylogenetic software such as
SplitsTree (www.splitstree.org)
112Tools for Trees
- Run data using phylogenetic software such as
SplitsTree (www.splitstree.org) - Choose the most appropriate algorithm (Neighbour
Joining for distance data)
113NeighborJoining
Salishan Languages (n30)
114NeighborJoining
Salishan Languages (n30)
Existing Classifications
115NeighborJoining
NeighborJoining
116NeighborJoining
- NeighborJoining
- specifically meant for
- phylogenetic trees
117NeighborJoining
- NeighborJoining
- specifically meant for
- phylogenetic trees
- does NOT assume equal rate
- of change
118Calibration of Method
Calibration best options, parameters,
factors A. for pure classification
119Calibration of Method
Calibration best options, parameters,
factors A. for pure classification - existing
classifications (Ethnologue WALS mainly the
well-documented areas)
120Calibration of Method
- Calibration best options, parameters, factors
- A. for pure classification
- - existing classifications (Ethnologue WALS
- mainly the well-documented areas)
- - expert knowledge of specific areas
121Calibration of Method
- Calibration best options, parameters, factors
- A. for pure classification
- - existing classifications (Ethnologue WALS
- mainly the well-documented areas)
- - expert knowledge of specific areas
- ? diversion 12
122Calibration of Method
- Calibration best options, parameters, factors
- A. for pure classification
- - existing classifications (Ethnologue WALS
- mainly the well-documented areas)
- - expert knowledge of specific areas
- ? diversion 12 ? if resistant niche!
123Calibration of Method
Calibration best options, parameters,
factors B. for dating
124Calibration of Method
Calibration best options, parameters,
factors B. for dating - linguistically
crucial historic events
125Linguistically crucial events
Date Historical event
Linguistic event
126Linguistically crucial events
Date Historical event
Linguistic event
127Linguistically crucial events
Date Historical event
Linguistic event
128Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- ? Standard formula (Swadesh)
- TimeDepth log(Similarity) / 2 log
Retention
129Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- ? Standard formula
- TimeDepth log(Similarity) / 2 log
Retention
130Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- ? Standard formula
- TimeDepth log(LDND) / 2 log Retention
131Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- ? Standard formula
- TimeDepth log(LDND) / 2 log Retention
132Linguistically crucial events
133Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- - Standard formula
- TimeDepth log(LDND) / 2 log 73
134Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- - Standard formula
- TimeDepth log(LDND) / 2 log 73 lt 75
135Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- - Standard formula
- TimeDepth log(LDND) / 2 log 73 lt 75 lt 85
136Calibration of Method
- Calibration best options, parameters, factors
- B. for dating
- - linguistically crucial historic events
- - Standard formula
- TimeDepth log(LDND) / 2 log 73 lt 75
Deeper!
137Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance
138Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance Add other linguistic
domains
139Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance Add other linguistic
domains WALS Typological
database
140Glottochronology only?
Calibration of method Glottochronology all
based on lexical distance Add other linguistic
domains WALS Typological
database Best result (75 40 lex) (25 40
Ph/M/S features)
1414. On Inheritance vs Borrowing
142Inherited or borrowed?
AVAR (AVA) / AGUL (AGL)
143Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0
144Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0
145Inherited or borrowed?
AVAR (AVA) / AGUL (AGL) I dunzun
LDND36.6 YOU munwun
LDND36.6 HORN tLark"arC
LDND66.0 FIRE c"ac"a LDND
0.0 FULL c"uraac"uf LDND66.0 NEW
c"iyac"EyEr LDND55.0 ? 6 items lt
70.0 ? Genetically related !!
146Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA)
147Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2
148Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 ? 6 items lt 70.0
149Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 ? 6 items lt 70.0
RELATED ???
150Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 ? RELATED ???
NO!!!
151Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 INDO-EUROPEAN lt gt
AUSTRONESIAN
152Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 CHANCE?
153Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 CHANCE? ? lt 5
(i.e. 1 2 items)
154Inherited or borrowed?
SPANISH (SPA) / CHAMORRO (CHA) ONE
unounu LDND36.9 TWO
dosdos LDND 0.0 PERSON
personapetsona LDND55.3 STAR
estreyaestrecas LDND61.2 NIGHT
noCenoces LDND68.2 NEW
nuevonueba LDND44.2 BORROWING through
LANGUAGE CONTACT
155Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9
156Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA lt?gt CHA
157Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA lt?gt CHA fam/gen
0.24/0.82
158Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA lt?gt CHA fam/gen
0.24/0.82 gt 0.03/0.00
159Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA lt?gt CHA fam/gen
0.24/0.82 gt 0.03/0.00
phon pattern fit 12.00 gt 0.67
160Inherited or borrowed?
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA lt?gt CHA fam/gen
0.24/0.82 gt 0.03/0.00
phon pattern fit 12.00 gt 0.67
gt gt
161Borrowed!
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROONE unounu
LDND36.9 SPA gt CHA fam/gen
0.24/0.82 gt 0.03/0.00
phon pattern fit 12.00 gt 0.67
162Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROTWO dosdos LDND
0.0 SPA gt CHA f/g 0.62/1.00 gt
0.12/0.00 swF 100.00
gt 0.22
163Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROSTAR estreyaestrecas
LDND61.2 SPA gt CHA f/g 0.17/0.82 gt
0.00/0.00 swF 100.00 gt 4.44
164Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONIGHT noCenoces
LDND68.2 SPA gt CHA f/g 0.23/0.55 gt
0.04/0.00 swF 100.00 gt 0.10
165Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORRONEW nuevonueba
LDND44.2 SPA gt CHA f/g 0.50/0.64 gt
0.04/0.00 swF 4.27 gt 0.03
166Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND55.3 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 swF 32.40 gt 0.13
167Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND55.3 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 swF 32.40 gt 0.13 ALT CHA
taotao (0.41/0.00)
168Borrowing
SPANISH (SPA) INDO-EUROPEAN (128) gt ROMANCE
/ CHAMORRO (CHA) AUSTRONESIAN (310) gt
CHAMORROPERSON personapetsona
LDND55.3 SPA gt CHA f/g 0.20/0.64 gt
0.01/0.00 swF 32.40 gt 0.13 ALT CHA
taotao (0.41/0.00)
1695. Conclusions
170Conclusions
- Method for automatic reconstruction of language
relationships, using mass comparison of
lexical and typological data
171Conclusions
- Method for automatic reconstruction of language
relationships - Framework to discuss and correct
existing classifications
172Conclusions
- Method for automatic reconstruction of language
relationships - Framework to discuss and correct
existing classifications - Test hypotheses about
genetic distances in time
173Conclusions
- Method for automatic reconstruction of language
relationships - Framework to discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate (and
eliminate) potential borrowings
174Conclusions
- Method for automatic reconstruction of language
relationships - Framework to discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate (and
eliminate) potential borrowings - C O R E
incremental lexical database (gt 35)
175Conclusions
- Method for automatic reconstruction of language
relationships - Framework to discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate (and
eliminate) potential borrowings - C O R E
incremental lexical database (gt 35) ? One day
soon Online
176Conclusions
- Method for automatic reconstruction of language
relationships - Framework to discuss and correct
existing classifications - Test hypotheses about
genetic distances in time - Locate (and
eliminate) potential borrowings - C O R E
incremental lexical database (gt 35) ? One day
soon Online ? Join and cooperate!!!
177Holman et al. (forthc. 2008) Explorations in
automated language classification. Folia
Linguistica Brown et al. (forthc. 2008)
Automated Classification of the Worlds
languages A description of the method and
prelimary results Sprachtypologie und
Universalienforschung Several working
papers email.eva.mpg.de./wichmann/ASJPHomePage
178?