Social Network Inspired Models of NLP and Language Evolution - PowerPoint PPT Presentation

About This Presentation
Title:

Social Network Inspired Models of NLP and Language Evolution

Description:

Social Network Inspired Models of NLP and Language Evolution Monojit Choudhury (Microsoft Research India) Animesh Mukherjee (IIT Kharagpur) Niloy Ganguly (IIT Kharagpur) – PowerPoint PPT presentation

Number of Views:362
Avg rating:3.0/5.0
Slides: 133
Provided by: cseIitkg2
Category:

less

Transcript and Presenter's Notes

Title: Social Network Inspired Models of NLP and Language Evolution


1
Social Network Inspired Models of NLP and
Language Evolution
Monojit Choudhury (Microsoft Research
India)Animesh Mukherjee (IIT Kharagpur) Niloy
Ganguly (IIT Kharagpur)
2
What is a Social Network?
  • Nodes Social entities (people, organization
    etc.)
  • Edges Interaction/relationship between entities
    (Friendship, collaboration, sex)

Courtesy http//blogs.clickz.com
3
Social Network Inspired Computing
  • Society and nature of human interaction is a
    Complex System
  • Complex Network A generic tool to model complex
    systems
  • There is a growing body of work on CNT Theory
  • Applied to a variety of fields Social,
    Biological, Physical Cognitive sciences,
    Engineering Technology
  • Language is a complex system

4
Objective of this Tutorial
  • To show that SNIC (Soc. Net. Inspired Comp.) is
    an emerging and promising technique
  • Apply it to model Natural Languages
  • NLP, Quantitative Linguistics, Language
    Evolution, Historical Linguistics, Language
    acquisition
  • Familiarize with tools and techniques in SNIC
  • Compare it with other standard approaches to NLP

5
Outline of the Tutorial
  • Part I Background
  • Introduction 25 min
  • Network Analysis Techniques 25 min
  • Network Synthesis Techniques 25 min
  • Break 320pm 340pm
  • Part II Case Studies
  • Self-organization of Sound Systems 20 min
  • Modeling the Lexicon 20 min
  • Unsupervised Labeling (Syntax Semantics) 20
    min
  • Conclusion and Discussions 20 min

6
Complex System
  • Non-trivial properties and patterns emerging from
    the interaction of a large number of simple
    entities
  • Self-organization The process through which
    these patterns evolve without any external
    intervention or central control
  • Emergent Property or Emergent Behavior The
    pattern that emerges due to self-organization

7
The best example from nature
A termite "cathedral" mound produced by a
termite colony
8
Emergence of a networked life
Communities
Atom
Organisms
Molecule
Tissue
Cell
Organs
9
Language a complex system
  • Language medium for communication through an
    arbitrary set of symbols
  • Constantly evolving
  • An outcome of self-organization at many levels
  • Neurons
  • Speakers and listeners
  • Phonemes, morphemes, words
  • 80-20 Rule in every level of structure

10
Three Views of a System
A useful trade-off between the two
MESOSCOPY
MACROSCOPY
May not give a complete picture or explanation of
what goes on
MICROSCOPY
May be too difficult to analyze or simulate the
macroscopic behavior
11
Language as a physical system
  • Microscopic a collection of utterances by
    individual speakers
  • Mesoscopic an interaction between phonemes,
    syllables, words, phrases
  • Macroscopic A set of grammar rules with a lexicon

12
Syntactic Network of Words
color
sky
weight
light
1
20
blue
100
blood
heavy
red
13
Complex Network Theory
  • Handy toolbox for modeling mesoscopy
  • Marriage of Graph theory and Statistics
  • Complex because
  • Non-trivial topology
  • Difficult to specify completely
  • Usually large (in terms of nodes and edges)
  • Provides insight into the nature and evolution of
    the system being modeled

14
Internet
15
Genetic interaction network
16
9-11 Terrorist Network Social Network Analysis
is a mathematical methodology for connecting the
dots -- using science to fight terrorism.
Connecting multiple pairs of dots soon reveals an
emergent network of organization.
17
CNT Examples Road and Airlines Network
18
What Questions can be asked
  • Does these networks display some symmetry?
  • Are these networks creation of intelligent
    objects or they have emerged?
  • How have these networks emerged
  • What are the underlying simple rules leading
    to their complex formation?

19
Bi-directional Approach
  • Analysis of the real-world networks
  • Global topological properties
  • Community structure
  • Node-level properties
  • Synthesis of the network by means of some simple
    rules
  • Preferential attachment models
  • Small-world models ..

20
Application of CNT in Linguistics - I
  • Quantitative linguistics
  • Invariance and typology (Zipfs law, syntactic
    dependencies)
  • Natural Language Processing
  • Unsupervised methods for text labeling (POS
    tagging, NER, WSD, etc.)
  • Textual similarity (automatic evaluation,
    document clustering)
  • Evolutionary Models (NER, multi-document
    summarization)

21
Application of CNT in Linguistics - II
  • Language Evolution
  • How did sound systems evolve?
  • Development of syntax
  • Language Change
  • Innovation diffusion over social networks
  • Language as an evolving network
  • Language Acquisition
  • Phonological acquisition
  • Evolution of the mental lexicon of the child

22
Linguistic Networks
Name Nodes Edges Why?
PhoNet Pho-nemes Co-occurrence likelihood in languages Evolution of sound systems
WordNet Words Ontological relation Host of NLP applications
Syntactic Network Words Similarity between syntactic contexts POS Tagging
Semantic Network Words, Names Semantic relation IR, Parsing, NER, WSD
Mental Lexicon Words Phonetic similarity and semantic relation Cognitive modeling, Spell Checking
Tree-banks Words Syntactic Dependency links Evolution of syntax
Word Co-occurrence Words Co-occurrence IR, WSD, LSA,
23
Summarizing
  • SNIC and CNT are emerging techniques for modeling
    complex systems at mesoscopic level
  • Applied to Physics, Biology, Sociology,
    Economics, Logistics
  • Language - an ideal application domain for SNIC
  • SNIC models in NLP, Quantitative linguistics,
    language change, evolution and acquisition

24
Topological Characterization of Networks
25
Types Of Networks and Representation
Unipartite Binary/ Weighted Undirected/ Directed
Bipartite Binary/ Weighted Undirected/ Directed
  • Representation
  • Adjacency Matrix
  • Adjacency List

a b c
a 0 1 1
b 1 0 1
c 1 1 0
a b,c
b a,c
c a,b
26
Properties of Adjacency Matrix
  • Aaij, where i and j are nodes and aij1 if
    there is an edge between i an j.
  • A2 AA Entries denote number paths of length 2
    between any two node (Saikakj)
  • In general, An denotes number of paths of length
    n
  • Trace(A) Saii
  • How is the trace of A3 related to the number of
    triangles in the n/w?

k
27
Characterization of Complex N/ws??
  • They have a non-trivial topological structure
  • Properties
  • Heavy tail in the degree distribution
    (non-negligible probability mass towards the
    tail more than in the case of an exp.
    distribution)
  • High clustering coefficient
  • Centrality Properties
  • Social Roles Equivalence
  • Assortativity
  • Community Structure
  • Random Graphs Small avg. path length
  • Preferential attachment
  • Small World Properties

28
Degree Distribution (DD)
  • Let pk be the fraction of vertices in the network
    that has a degree k.
  • The k versus pk plot is defined as the degree
    distribution of a network
  • For most of the real world networks these
    distributions are right skewed with a long right
    tail showing up values far above the mean pk
    varies as k-a
  • Due to noisy and insufficient data sometimes the
    definition is slightly modified
  • Cumulative degree distribution is plotted
  • Probability that the degree of a node is greater
    than or equal to k

29
A Few Examples
Power law Pk k-a
30
Friend of Friends
  • Consider the following scenario
  • Sourish and Ravi are friends
  • Sourish and Shaunak are friends
  • Are Shaunak and Ravi friends?
  • If so then
  • This property is known as transitivity

31
Measuring Transitivity Clustering Coefficient
  • The clustering coefficient for a vertex v in a
    network is defined as the ratio between the total
    number of connections among the neighbors of v
    to the total number of possible connections
    between the neighbors
  • High clustering coefficient means my friends know
    each other with high probability a typical
    property of social networks

32
Mathematically
  • The clustering coefficient of a vertex i is
  • The clustering coefficient of the whole network
    is the average
  • Alternatively,

33
Centrality
  • Centrality measures are commonly described as
    indices of 4 Ps -- prestige, prominence,
    importance, and power
  • Degree Count of immediate neighbors
  • Betweenness Nodes that form a bridge between
    two regions of the n/w
  • Where sst is total number of shortest paths
    between s and t and sst (v) is the total number
    of shortest paths from s to t via v

34
Eigenvector centrality Bonacich (1972)
  • It is not just how many people knows me counts to
    my popularity (or power) but how many people
    knows people who knows me this is recursive!
  • In context of HIV transmission A person x with
    one sex partner is less prone to the disease than
    a person y with multiple partners
  • But imagine what happens if the partner of x has
    multiple partners
  • The basic idea of eigenvector centrality

35
Definition
  • Eigenvector centrality is defined as the
    principal eigenvector of the adjacency matrix
  • Eigenvector of any symmetric matrix A aij is
    any vector e such that
  • Where ? is a constant and ei is the centrality of
    the node i
  • What does it imply centrality of a node is
    proportional to the centrality of the nodes it is
    connected to (recursively)
  • Practical Example Google PageRank

36
Node Equivalence
  • Social Roles Nodes (actors) in a social n/w who
    have similar patterns of relations (ties) with
    other nodes.
  • Three Different ways to find equivalence classes
  • Structural Equivalence
  • Automorphic Equivalence
  • Regular Equivalence

37
Structural Equivalence
  • Two nodes are said to be exactly structurally
    equivalent if they have the same relationships to
    all other nodes.

Computation Let A be the adjacency
matrix. Compute the Euclidean Distance /Pearson
Correlation between a pair or rows/columns
representing the neighbor profile of two nodes
(say i and j). This value shows how much
structurally similar i and j are.
38
Automorphic Equivalence
  • The idea of automorphic equivalence is that sets
    of actors can be equivalent by being embedded in
    local structures that have the same patterns of
    ties -- "parallel" structures.

Swap(B,D) with all their neighborsThe distances
among all the actors in the graph would be
exactly identical Path vectors of i how many
nodes are at distance 1, 2, from node i. Amount
of Equivalence Distance between path vectors
39
Regular Equivalence
  • Two nodes are said to be regularly equivalent if
    they have the same profile of ties with members
    of other sets of actors that are also regularly
    equivalent.

1 tie with Class II No tie with Class III
Class I
1 tie with Class I 1/2 tie(s) with Class III
Class II
No tie with Class I 1 tie with Class II
Class III
40
Assortativity (homophily)
  • Rich goes with the rich (selective linking)
  • A famous actor (e.g., Shah Rukh Khan) would
    prefer to pair up with some other famous actor
    (e.g., Rani Mukherjee) in a movie rather than a
    new comer in the film industry.

Assortative Scale-free network
Disassortative Scale-free network
41
Measures of Assortativity
  • ANND (Average nearest neighbor degree)
  • Find the average degree of the neighbors of each
    node i with degree k
  • Find the Pearson correlation (r) between the
    degree of i and the average degree of its
    neighbors
  • For further reference see the supplementary
    material

42
Community structure
  • Community structure a group of vertices that
    have a high density of edges within them and a
    low density of edges in between groups
  • Example
  • Friendship n/w of children
  • Citation n/ws research interest
  • World Wide Web subject matter of pages
  • Metabolic networks Functional units
  • Linguistic n/ws similar linguistic categories

43
Some Examples
Community Structure in Political Books
Community structure in a Social n/w of Students
(American High School)
44
Community Identification Algorithms
  • Hierarchical
  • Girvan-Newman
  • Radicchi et al.
  • Chinese Whishpers
  • Spectral Bisection
  • See (Newman 2004) for a comprehensive
  • survey (you will find the ref. in the
  • supplementary material)

45
Girvan-Newman Algorithm
  • Bisection Method
  • Calculate the betweenness for all edges in the
    network.
  • Remove the edge with the highest betweenness.
  • Recalculate betweennesses for all edges affected
    by the removal.
  • Repeat from step 2 until no edges remain.

46
Evolution of NetworksProcesses on Networks
47
Random Graphs Small Average Path Length
Q What do we mean by a random graph? A
Erdos-Renyi random graph model For every pair of
nodes, draw an edge between them with equal
probability p.
Degrees of Separation in a Random Graph
  • N nodes
  • z neighbors per node, on average, z ltkgt
  • D degrees of separation

P(k) e-ltkgt ltkgtk/k!
48
Degree Distributions
49
Degree distributions for various networks
  1. World-Wide Web
  2. Coauthorship networks computer science, high
    energy physics, condensed matter physics,
    astrophysics
  3. Power grid of the western United States and
    Canada
  4. Social network of 43 Mormons in Utah

50
How do Power law DDs arise?
Barabási-Albert Model of Preferential Attachment
(Rich gets Richer)
(1) GROWTH Starting with a small number of
nodes (m0) at every timestep we add a new node
with m (ltm0) edges (connected to the nodes
already present in the system). (2) PREFERENTIAL
ATTACHMENT The probability ? that a new node
will be connected to node i depends on the
connectivity ki of that node
A.-L.Barabási, R. Albert, Science 286, 509 (1999)
51
Mean Field Theory
52
The World is Small!
  • Registration fee for IJCNLP 2008 are being
    waived for all participants get it collected
    from the registration counter
  • How long do you think the above information will
    take to spread among yourselves
  • Experiments say it will spread very fast within
    6 hops from the initiator it would reach all
  • This is the famous Milgrams six degrees of
    separation

53
The Small World Effect
  • Even in very large social networks, the average
    distance
  • between nodes is usually quite short.
  • Milgrams small world experiment
  • Target individual in Boston
  • Initial senders in Omaha, Nebraska
  • Each sender was asked to forward a packet to a
    friend who was closer to the target
  • Friends asked to do the same
  • Result Average of six degrees of separation.
  • S. Milgram, The small world problem, Psych.
    Today, 2 (1967), pp. 60-67.

54
Measure of Small-Worldness
  • Low average geodesic path length
  • High clustering coefficient
  • Geodesic path Shortest path through the network
    from one vertex to another
  • Mean path length
  • l 2?ijdij/n(n1) where dij is the geodesic
    distance from vertex i to vertex j
  • Most of the networks observed in real world have
    l 6
  • Film actors 3.48
  • Company Directors 4.60
  • Emails 4.95
  • Internet 3.33
  • Electronic circuits 4.34

55
Clustering
C Probability that two of a nodes neighbors
are themselves connected In a random graph
Crand 1/N (if the average degree is held
constant)
56
Watts-Strogatz Small World Model
Watts and Strogatz introduced this simple model
to show how networks can have both short path
lengths and high clustering.
D. J. Watts and S. H. Strogatz, Collective
dynamics of small-world networks, Nature, 393
(1998), pp. 440442.
57
Small-world model
  • Used for modeling network transitivity
  • Many networks assume some kind of geographical
    proximity
  • Small-world model
  • Start with a low-dimensional regular lattice
  • Rewire
  • Add/remove edges to create shortcuts to join
    remote parts of the lattice
  • For each edge with prob p move the other end to a
    random vertex
  • Rewiring allows to interpolate between regular
    lattice and random graph

58
Small-world model
  • Regular lattice (p0)
  • Clustering coefficient C(3k-3)/(4k-2)3/4
  • Mean distance L/4k
  • Almost random graph (p1)
  • Clustering coefficient C2k/L
  • Mean distance log L / log k
  • No power-law degree distribution

Rewiring probability p
Degree distribution
59
Resilience of Networks
  • We consider the resilience of the network to the
    removal of its vertices (site percolation) or
    edges (bond percolation).
  • As vertices (or edges) are removed from the
    network, the average path length will increase.
  • Ultimately, the giant component will
    disintegrate.
  • Networks vary according to their level of
    resilience to vertex (or edge) removal.

60
Stability MetricPercolation Threshold
fc fraction of nodes removed
f fraction of nodes removed
Initial single connected component
The entire graph breaks into smaller fragments
Giant component still exists
Therefore fc 1-qc becomes the percolation
threshold
61
Ordinary Percolation on Lattices
Fill in each link (bond percolation) or site
(site percolation) with probability p and ask
questions about the sizes of connected
components.
62
Percolation in Poisson and Scale free networks
Exponential Network
Scale free Network
63
CASE STUDY I Self-Organization of the Sound
Inventories
64
Human Communication
  • Human beings and many other living organisms
    produce sound signals
  • Unlike other organisms, they can concatenate
    these sounds to produce new messages Language
  • Language is one of the primary cause/effect of
    human intelligence

65
Human Speech Sounds
  • Human speech sounds are called phonemes the
    smallest unit of a language
  • Phonemes are characterized by certain distinctive
    features like

66
Types of Phonemes
Consonants
Vowels
Diphthongs
L
/t/
/i/
/ai/
/a/
/u/
/p/
/k/
67
Choice of Phonemes
  • How a language chooses a set of phonemes in order
    to build its sound inventory?
  • Is the process arbitrary?
  • Certainly Not!
  • What are the forces affecting this choice?

68
Forces of Choice
A Linguistic System How does it look?
Desires ease of articulation
Desires perceptual contrast / ease of
learnability
The forces shaping the choice are opposing
Hence there has to be a non-trivial solution
69
Vowels A (Partially) Solved Mystery
  • Languages choose vowels based on maximal
    perceptual contrast.
  • For instance if a language has three vowels then
    in more than 95 of the cases they are /a/,/i/,
    and /u/.

70
Consonants A puzzle
  • Research From 1929 Date
  • No single satisfactory explanation of the
    organization of the consonant inventories
  • The set of features that characterize consonants
    is much larger than that of vowels
  • No single force is sufficient to explain this
    organization
  • Rather a complex interplay of forces goes on in
    shaping these inventories

71
Principle of Occurrence
  • PlaNet The Phoneme-Language Network
  • A bipartite network N(VL,VC,E)
  • VL Nodes representing languages of the world
  • VC Nodes representing consonants
  • E Set of edges which run between VL and VC
  • There is an edge e ? E between two nodes
  • vl ? VL and vc ? VC if the consonant c occurs
  • in the language l.

Choudhury et al. 2006 ACL Mukherjee et al. 2007
Int. Jnl of Modern Physics C
The Structure of PlaNet
72
Construction of PlaNet
  • Data Source UCLA Phonological Inventory
    Database (UPSID)
  • Number of nodes in VL is 317
  • Number of nodes in VC is 541
  • Number of edges in E is 7022

73
Degree Distribution of PlaNet
DD of the language nodes follows a ß-distribution
DD of the consonant nodes follows a
power-law with an exponential cut-off
Distribution of Consonants over Languages follow
a power-law
74
Synthesis of PlaNet
  • Non-linear preferential attachment
  • Iteratively construct the language inventories
    given their inventory sizes

dia e
Pr(Ci)
?x?V (dxa e)
75
Simulation Result
The parameters a and e are 1.44 and 0.5
respectively. The results are averaged over 100
runs
76
Principle of Co-occurrence
  • Consonants tend to co-occur in groups or
    communities
  • These groups tend to be organized around a few
    distinctive features (based on manner of
    articulation, place of articulation phonation)
    Principle of feature economy

If a language has
in its inventory
then it will also tend to have
77
How to Capture these Co-occurrences?
  • PhoNet Phoneme Phoneme Network
  • A weighted network N(VC,E)
  • VC Nodes representing consonants
  • E Set of edges which run between the nodes in
    VC
  • There is an edge e ? E between two nodes vc1 ,vc2
    ? VC if the consonant c1 and c2 co-occur in a
    language. The number of languages in which c1 and
    c2 co-occurs defines the edge-weight of e. The
    number of languages in which c1 occurs defines
    the node-weight of vc1.

78
Construction of PhoNet
  • Data Source UPSID
  • Number of nodes in VC is 541
  • Number of edges is 34012

PhoNet
79
Community Structures in PhoNet
  • Radicchi et al. algorithm (for unweighted
    networks) Counts number of triangles that an
    edge is a part of. Inter-community edges will
    have low count so remove them.
  • Modification for a weighted network like PhoNet
  • Look for triangles, where the weights on the
    edges are comparable.
  • If they are comparable, then the group of
    consonants co-occur highly else it is not so.
  • Measure strength S for each edge (u,v) in PhoNet
    where S is,
  • Remove edges with S less than a threshold ?

80
Community Formation
S
For different values of ? we get different sets
of communities
81
Consonant Societies!
?0.35
?0.60
?0.72
?1.25
The fact that the communities are good can
quantitatively shown by measuring the feature
entropy
82
Problems to ponder on
  • Physical significance of PA
  • Functional forces
  • Historical/Evolutionary process
  • Labeled synthesis of PlaNet and PhoNet
  • Language diversity vs. Preferential attachment

83
CASE STUDY II Modeling the Mental Lexicon
84
Metal Lexicon (ML) Basics
  • It refers to the repository of the word forms
    that resides in the human brain
  • Two Questions
  • How words are stored in the long term memory,
    i.e., the organization of the ML.
  • How are words retrieved from the ML (lexical
    access)
  • The above questions are highly inter-related
    to predict the organization one can investigate
    how words are retrieved and vice versa.

85
Different Possible Ways of Organization
  • Un-organized (a bag full of words) or,
  • Organized
  • By sound (phonological similarity)
  • E.g., start the same banana, bear, bean
  • End the same look, took, book
  • Number of phonological segments they share
  • By Meaning (semantic similarity)
  • Banana, apple, pear, orange
  • By age at which the word is acquired
  • By frequency of usage
  • By POS
  • Orthographically

86
The Hierarchical Model of ML
  • Proposed by Collins and Quillian in 1969
  • Concepts are organized in a hierarchy
  • Taxonomic and attributive relations are
    represented
  • Cognitive Economy Put the attributes at the
    highest of all appropriate levels e.g.,
    reproduces applies to the whole animal kingdom

87
Hierarchical Model
  • According to the principle of cognitive economy
  • Animals eat lt mammals eat lt humans eat
  • However, shark is a fish salmon is a fish
  • What do lt and mean?
  • lt Less time to judge
  • Equal time to judge

88
Spreading Activation Model of ML
  • Not a hierarchical structure but a web of
    inter-connected nodes (first proposed by Collins
    and Loftus in 1975)
  • Distance between nodes is determined by the
    structural characteristics of the word-forms (by
    sound, by meaning, by age, by )
  • Combining the above two plethora of complex
    networks

89
Phonological Neighborhood Network
  • (Vitevitch 2004)
  • (Gruenenfelder Pisoni, 2005)
  • (Kapatsinski 2006)
  • Sound Similarity Relations in the Mental
    Lexicon Modeling the Lexicon as a Complex
    Network

90
N/W Definition
  • Nodes Words
  • Edge An edge is drawn from node A to node B if
    at least 2/3 of the segments that occur in word
    represented by A also occurs in the word
    represented by B
  • i.e., if the word represented by A is 6 segments
    long then one can derive all its neighbors (B)
    from it by two phoneme changes (insertions,
    deletions or substitutions).

91
N/W Construction
  • Datbase
  • Hoosier Mental Lexicon (Nusbaum et al., 1984)
  • phonologically transcribed words ? n/w using the
    metric defined earlier
  • Nodes with no links (correspond to hermit words
    i.e., words that have no neighbors)
  • Random networks (E-R) for comparison
  • Directed n/w ? a long word can have a short word
    as a neighbor, not vice versa
  • Have a link only if the duration of the
    difference in the word pair lt (duration of a
    word)/3 (the factor 1/3 is experimentally derived
    see the paper for further info.)

92
Neighborhood Density
  • The node whose neighbors are searched ? base
    words
  • Neighborhood density of a base word is expressed
    as the out-degree of the node representing the
    base word
  • Is an estimate of the number of words activated
    by the base word when the base word is presented
    ? spreading activation
  • Something like semantic priming (however, in the
    phonological level)

93
Results of the N/W Analysis
  • Small-world Properties
  • High clustering but also long average path length
    -- like a SW network the lexicon has densely
    connected neighborhoods but the links between two
    nodes of different neighborhoods is harder to
    find than in SW networks

94
Visualization A Disconnected Graph with a
Giant Component (GC)
  • GC is elongated there are some nodes that have
    really long chain of intermediates and hence the
    mean path length is long

95
Low Degree Nodes are Important!!!
  • Removal of low degree nodes renders the n/w
    almost disconnected
  • A bottleneck is formed between longer (more than
    7 segments long) and shorter words
  • This bottleneck consists the tion final words
    coalition, passion, nation, fixation/fission
    they form short-cuts between the high-degree
    nodes (i.e., they are low-degree stars that
    connect mega-neighborhoods)

96
Removal of Nodes with Degree lt 40
2-4 segment words
Removal of low-degree nodes disconnect the n/w
as opposed to the removal of hubs like pastor
(deg. 112)
8-10 segment words
97
Why low connectivity between neighborhoods?
  • Spreading activation should not inhibit
  • neighbors of the stimulus neighbors that are
    non-neighbors of the stimulus itself (and are
    therefore, not similar to the stimulus)
  • Low mean path ? complete traversal of n/ws, for
    e.g., in general purpose search
  • Search in lexicon does not need to traverse links
    between distant nodes rather it involves an
    activation of the structured neighborhood that
    share a single sub-lexical chunk that could be
    acoustically related during word-recognition
    (Marslen-Wilson, 1990).

98
Degree Distribution (DD)
  • Exponential rather than power-law

5-7 segment words
Entire Lexicon
8-10 segment words
99
Other Works (see supplementary material for
reference)
  • Vitevitch (2005)
  • similar to the above work but builds n/ws of
    nodes that are just one-segment different
  • (Choudhury et al. (2007)
  • Builds weighted n/ws in Hindi, Bengali and
    English based on orthographic proximity (nodes
    words edges orthographic edit-distance)
    SpellNet
  • Does thresholding (?) to make the n/ws binary (at
    ? 1, 3, 5).
  • They also obtain exponential DDs
  • Observe that occurrence of real word errors in a
    language is proportional to avg. wghtd. deg. of
    the SpellNet of that language

100
Other Works
  • Sigman et al. (2002)
  • Analyzes the English WordNet
  • All semantic relationships are scale-invariant
  • Inclusion of polysemy make the n/w SW
  • Ferrer i Cancho et al. (2000,2001)
  • Word co-occurrence (in a sentence) based
    definitions of the lexicon
  • Lexicon Kernel Lexicon Peripheral Lexicon
  • Finds a 2-regime DD one comprises words in the
    kernel lexicon and the other words in the
    peripheral lexicon
  • Finds that these n/ws are small-world

101
Some Unsolved Mysteries You can Give it a Try
?
  • What can be a model for the evolution of the ML?
  • How is the ML acquired by a child learner?
  • Is there a single optimal structure for the ML
    or is it organized based on multiple criteria
    (i.e., a combination of the different n/ws)
    Towards a single framework for studying ML!!!

102
CASE STUDY III SyntaxUnsupervised POS Tagging
103
Labeling of Text
  • Lexical Category (POS tags)
  • Syntactic Category (Phrases, chunks)
  • Semantic Role (Agent, theme, )
  • Sense
  • Domain dependent labeling (genes, proteins, )
  • How to define the set of labels?
  • How to (learn to) predict them automatically?

104
Nothing makes sense, unless in context
  • Distribution-based definition of
  • Lexical category
  • Sense (meaning)
  • The X is
  • If you X then I shall
  • looking at the star PP

105
General Approach
  • Represent the context of a word (token)
  • Define some notion of similarity between the
    contexts
  • Cluster the contexts of the tokens
  • Get the label of the tokens
  • w1 w2 w3 w4

w1
w3
w2
w4
106
Issues
  • How to define the context?
  • How to define similarity
  • How to Cluster?
  • How to evaluate?

107
Unsupervised Parts-of-Speech Tagging Employing
Efficient Graph Clustering
  • Chris Biemann
  • COLING-ACL 2006

108
Stages
  • Input raw text corpus
  • Identify feature words and define a graph for
    high and medium frequency words (10000)
  • Cluster the graph to identify the classes
  • For low frequency words, use context similarity
  • Lexicon of word classes ? tag the same text ?
    learn a Viterbi tagger

109
Features Words
  • Estimate the unigram frequencies
  • Feature words Most frequent 200 words

110
Feature Vector
  • From the familiar to the exotic, the collection
    is a delight

the
to
is
from
fw1
fw2
fw199
fw200
p-2
0 0 0 1
1 0 0 0
0 1 0 0
1 0 0 0
p-1
p1
p2
111
Syntactic Network of Words
color
sky
weight
light
1
20
blue
100
blood
heavy
1 1 cos(red, blue)
red
112
The Chinese Whisper Algorithm
color
sky
weight
0.9
0.8
light
-0.5
0.7
blue
0.9
blood
heavy
0.5
red
113
The Chinese Whisper Algorithm
color
sky
weight
0.9
0.8
light
-0.5
0.7
blue
0.9
blood
heavy
0.5
red
114
The Chinese Whisper Algorithm
color
sky
weight
0.9
0.8
light
-0.5
0.7
blue
0.9
blood
heavy
0.5
red
115
Medium and Low Frequency Words
  • Neighboring (window 4) co-occurrences ranked by
    log-likelihood thresholded by ?
  • Two words are connected iff they share at least 4
    neighbors

Language English Finnish German
Nodes 52857 85627 137951
Edges 691241 702349 1493571
116
Construction of Lexicon
  • Each word assigned a unique tag based on the word
    class it belongs to
  • Class 1 sky, color, blood, weight
  • Class 2 red, blue, light, heavy
  • Ambiguous words
  • High and medium frequency words that formed
    singleton cluster
  • Possible tags of neighboring clusters

117
Training and Evaluation
  • Unsupervised training of trigram HMM using the
    clusters and lexicon
  • Evaluation
  • Tag a text, for which gold standard is available
  • Estimate the conditional entropy H(TC) and the
    related perplexity 2H(TC)
  • Final Results
  • English 2.05 (619/345), Finnish 3.22
    (625/466), German 1.79 (781/440)

118
Example
From the familiar to the exotic, the collection
is a delight Prep At JJ Prep At JJ
At NN V At NN C200 C1 C331 C5
C1 C331 C1 C221 C3 C1 C220
119
Word Sense Disambiguation
  • Véronis, J. 2004. HyperLex lexical cartography
    for information retrieval. Computer Speech
    Language 18(3)223-252.
  • Let the word to be disambiguated be light
  • Select a subcorpus of paragraphs which have at
    least one occurrence of light
  • Construct the word co-occurrence graph

120
HyperLex
  • A beam of white light is dispersed into its
    component colors by its passage through a prism.
  • Energy efficient light fixtures including
    solar lights, night lights, energy star lighting,
    ceiling lighting, wall lighting, lamps
  • What enables us to see the light and
    experience such wonderful shades of colors during
    the course of our everyday lives?

prism
beam
dispersed
white
colors
shades
energy
fixtures
efficient
lamps
121
Hub Detection and MST
prism
beam
light
dispersed
white
colors
lamps
colors
shades
beam
prism
fixtures
energy
shades
energy
efficient
dispersed
white
fixtures
efficient
lamps
White fluorescent lights consume less energy than
incandescent lamps
122
Other Related Works
  • Solan, Z., Horn, D., Ruppin, E. and Edelman, S.
    2005. Unsupervised learning of natural languages.
    PNAS, 102 (33) 11629-11634
  • Ferrer i Cancho, R. 2007. Why do syntactic links
    not cross? Europhysics Letters
  • Also applied to IR, Summarization, sentiment
    detection and categorization, script evaluation,
    author detection,

123
Discussions Conclusions
  • What we learnt
  • Advantages of SNIC in NLP
  • Comparison to standard techniques
  • Open problems
  • Concluding remarks and QA

124
What we learnt
  • What is SNIC and Complex Networks
  • Analytical tools for SNIC
  • Applications to human languages
  • Three Case-studies

Area Perspective Technique
I Sound systems Language evolution and change Synthesis models
II Lexicon Psycholinguistic modeling and linguistic typology Topology and search
III Syntax Semantics Applications to NLP Clustering
125
What we saw
  • Language features complex structure at every
    level of organization
  • Linguistic networks have non-trivial properties
    scale-free small-world
  • Therefore, Language and Engineering systems
    involving language should be studied within the
    framework of complex systems, esp. CNT

126
Advantages of SNIC
  • Fully Unsupervised techniques
  • No labeled data required A good solution to
    resources scarcity
  • Problem of evaluation circumvented by
    semi-supervised techniques
  • Ease of computation
  • Simple and scalable
  • Distributed and parallel computable
  • Holistic treatment
  • Language evolution psycho-linguistic theories

127
Comparison to Standard Techniques
  • Rule-based vs. Statistical NLP
  • Graphical Models
  • Generative models in machine learning
  • HMM, CRF, Bayesian belief networks

JJ
NN
RB
VF
128
Graphical Models vs. SNIC
  • GRAPHICAL MODEL
  • COMPLEX NETWORK
  • Principled based on Bayesian Theory
  • Structure is assumed and parameters are learnt
  • Focus Decoding parameter estimation
  • Data-driven or computationally intensive
  • The generative process is easy to visualize, but
    no visualization of the data
  • Heuristic, but underlying principles of linear
    algebra
  • Structure is discovered and studied
  • Focus Topology and evolutionary dynamics
  • Unsupervised and computationally easy
  • Easy visualization of the data

129
Language Modeling
  • A network of words as a model of language vs.
    n-gram models
  • Hierarchical, hyper-graph based models
  • Smoothing through holistic analysis of the
    network topology
  • Jedynak, B. and Karakos, D. 2007. Unigram
    Language Models using Diffusion Smoothing over
    Graphs. Proc. of TextGraphs - 2

130
Open Problems
  • Universals and variables of linguistic networks
  • Superimposition of networks phonetic, syntactic,
    semantic
  • Which clustering algorithm for which topology?
  • Metrics for network comparison important for
    language modeling
  • Unsupervised dependency parsing using networks
  • Mining translation equivalents

131
Resources
  • Conferences
  • TextGraphs, Sunbelt, EvoLang, ECCS
  • Journals
  • PRE, Physica A, IJMPC, EPL, PRL, PNAS, QL, ACS,
    Complexity, Social Networks
  • Tools
  • Pajek, CUNG, http//www.insna.org/INSNA/soft_inf.
    html
  • Online Resources
  • Bibliographies, courses on CNT

132
Contact
  • Monojit Choudhury
  • monojitc_at_microsoft.com
  • http//www.cel.iitkgp.ernet.in/monojit/
  • Animesh Mukherjee
  • animeshm_at_cse.iitkgp.ernet.in
  • http//www.cel.iitkgp.ernet.in/animesh/
  • Niloy Ganguly
  • niloy_at_cse.iitkgp.erent.in
  • http//www.facweb.iitkgp.ernet.in/niloy/

133
Thank you!!
Write a Comment
User Comments (0)
About PowerShow.com