The Small World of Human Language - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

The Small World of Human Language

Description:

A complete theory of language requires a theoretical understanding of its ... kernel lexicon: a common lexicon for successful basic communication ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 27
Provided by: leven
Category:
Tags: human | language | world

less

Transcript and Presenter's Notes

Title: The Small World of Human Language


1
The Small World of Human Language
  • Ramon Ferrer i Cancho
  • Richard V. Sole
  • presented by Emre Erdem

2
IntroductionZipfs Law (Zipf 1972)
  • A complete theory of language requires a
    theoretical understanding of its implicit
    statistical regularities. Zips Law is the best
    known
  • Zipfs Law the frequency of words decays as a
    power function of its rank
  • In spite of its relevance and universality, such
    a law can be obtained by various mechanisms and
    does not provide deep insight into the
    organization of the language

3
IntroductionLexicons
  • lexicon 1.dictionary
  • 2.list of vocabulary belonging to a specific
    field

Human brains store lexicons that are usually
formed by thousands of words. (in the range of
words)
kernel lexicon a common lexicon for successful
basic communication
4
Introduction
  • Co-occurrence of words in sentences relies on the
    network structure of the lexicon.
  • Human language can be described in terms of a
    graph of word interactions. This graph has some
    unexpected properties that might underlie its
    diversity and flexibility, and create new
    questions about its origins and organization

5
Graph Properties of Human Language
  • Words co-occur in sentences
  • Syntactical relationships
  • Stereotyped expressions or collocations
  • (New York, take it easy)

6
Graph Properties of Human Language Links
  • Links Significant co-occurrences between words
    in the same sentence.
  • The most correlated words in a sentence are the
    closest.
  • A decision must be taken about the maximum
    distance considered for forming links.
  • If the distance is long, the risk of capturing
    spurious co-occurrences increases
  • If the distance is too short, certain strong
    co-occurrences can be systematically not taken
    into account

7
Graph Properties of Human Language Links
  • A toy network constructed with four sentences
  • John is tall
  • John drinks water
  • Mary is blonde
  • Mary drinks wine

The graph is constructed by linking words at a
distance one or two in the same sentence
8
Graph Properties of Human Language Links
  • The maximum distance is decided according to
    minimum distance at which most of the
    co-occurrences are likely to happen
  • Many co-occurrences take place at a distance of
    one
  • red flowers (adjective-noun), stay here
    (verb-adverb), can see (modal-verb), getting dark
    (verb-adjective), the/this house
    (article-determiner-noun)
  • Many co-occurrences take place at a distance of
    two
  • hit the ball (verb-object), Mary usually cries
    (subject-verb), table of wood (noun-noun through
    a prepositional phrase), live in Boston
    (verb-noun)

9
Graph Properties of Human Language Links
  • Seek will be stopped at a distance of two
  • Lack of an automatic capturing technique
  • Method fails to capture the exact relationships
    but does capture almost every possible type of
    links
  • We are not interested in all the relationships.
    Our goal is to capture as many links as possible
    through an automatic procedure.
  • A long-distance syntactic link implies the
    existence of lower-distance syntactic links. By
    contrast a short-distance link does not imply a
    long-distance link

10
Graph Properties of Human Language Improving
the technique
  • Choose only pairs of consecutive words, the
    mutual co-occurrence of which is larger than
    expected by chance.

presence of correlations (co-occurances in
real case)
if this condition is used in the graph
expected from random ordering (theoretical
probability of co-occurance)
11
Graph Properties of Human Language The Graph
12
Graph Properties of Human Language The Graph
  • Possible pattern of wiring in . Black nodes
    are common words and white nodes are rare words.
    Two words are linked if they occur significantly

13
Graph Properties of Human Language The Small
World Properties
The small world pattern can be detected from the
analysis of two basic statistical properties
14
Graph Properties of Human Language The Small
World Properties
15
Graph Properties of Human Language Clustering
coefficient
define (total number of edges that exists)
the set of nearest neighbors (possible number
of edges X 2)
16
Graph Properties of Human Language Average path
length
17
Scaling and Small-World Patterns
UWN (Unrestricted Word Network) the networks
that results from basic method RWN (Restricted
Word Network) the networks that results from
improved method
average connectivity
18
Scaling and Small-World Patterns
Distribution of degrees both the UWN and RWN
obtained after processing three-quarters of the
words
19
Scaling and Small-World Patterns
More frequent a word, the more available it is
for production and comprehension. This phenomenon
is known as frequency or recency effect. This
phenomenon explains why preferential attachment
shapes the scale-free distribution of our case
For the most frequent words,
where k is the degree and f is the frequency
20
Scaling and Small-World Patterns Kernel Words
The network formed exclusively by interaction of
kernel words, hereafter called the Kernel Word
Network (KWN) better agrees with the predictions
that can be performed when preferential
attachment is at play.
21
Scaling and Small-World Patterns Kernel Words
The connectivity distribution for the kernel word
network formed by 5000 most connected vertices in
RWN The average connectivity in the kernel is
22
Scaling and Small-World Patterns Kernel Words
23
Discussion
  • If the SW features derive from optimal navigation
    needs
  • Words the main purpose of which is to speed-up
    navigation must exist._
  • Brain disorders characterized by navigation
    deficits in which such words are involved must
    exist_

24
Discussion First Prediction
  • 10 most connected words
  • and
  • the
  • of
  • in
  • a
  • to
  • s
  • with
  • by
  • is

25
Discussion Second Prediction
Agrammatism a kind of aphasia in which speech is
non-fluent, laboured, halting and lacking in
function words aphasia total or partial loss of
the ability to use or understand spoken or
written language. It is a symptom of brain
disease or injury
26
Thank you
Write a Comment
User Comments (0)
About PowerShow.com