Lexical networks - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Lexical networks

Description:

A special case of networks where nodes are words or documents and edges link ... Co-occurrence networks [Dorogovtsev and Mendes 2001, Sole and Ferrer i Cancho 2001] ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 50
Provided by: LAD101
Category:

less

Transcript and Presenter's Notes

Title: Lexical networks


1
Lecture 19 Lexical networks
Slides modified from Dragomir R. Radev
2
Social data
  • Blog postings
  • News stories
  • Speeches in Congress
  • Query logs
  • Movie and book reviews
  • Scientific papers
  • Financial reports
  • Query logs
  • Encyclopedia entries
  • Email
  • Chat room discussions
  • Social networking sites

WHAT DO ALL OF THESE HAVE IN COMMON?
3
Natural language processing
  • Part of speech tagging
  • Prepositional phrase attachment
  • Parsing
  • Word sense disambiguation
  • Document indexing
  • Text summarization
  • Machine translation
  • Question answering
  • Information retrieval
  • Social network extraction
  • Topic modeling

4
Talk outline
  • Lexical networks
  • Semantic networks
  • Lexical centrality
  • Latent networks
  • Conclusion

5
Lexical networks
6
Lexical networks
  • A special case of networks where nodes are words
    or documents and edges link semantically related
    nodes
  • Other examples
  • Words used in dictionary definitions
  • Names of people mentioned in the same story
  • Words that translate to the same word
  • A semantic network consists of a set of nodes
    that are connected by labeled arcs.
  • The nodes represent concepts and
  • The arcs represent relations between concepts.

7
Semantic network
8
Free word associations
The large-scale structure of semantic
networks statistical analyses and a model of
semantic growth M. Steyvers, J. B. Tenenbaum
(2005) Cognitive Science, 29(1)
9
Dependency network
bought
Meredith
yesterday
apples
green
10
Dependency network
11
Semantic Networks
12
So again A Semantic Network is
  • A semantic (or associative) network is a simple
    representation scheme which uses a graph of
    labeled nodes and labeled, directed arcs to
    encode knowledge.
  • Labeled nodes objects/classes/concepts.
  • Labeled links relations/associations between
    nodes
  • Labels define the semantics of nodes and links
  • Usually used to represent static, taxonomic,
    concept dictionaries

13
Nodes and Arcs
  • Nodes denote objects/classes
  • arcs define binary relationships between objects.

mother
age
Sue
john
5
wife
age
father
mother(john,sue) age(john,5) wife(sue,max) age(sue
,34) ...
husband
34
Max
age
14
Common Semantic Relations
  • There is no standard set of relations for
    semantic networks, but the following relations
    are very common
  • INSTANCE X is an INSTANCE of Y if X is a
    specific example of the general concept Y.
  • Example Elvis is an INSTANCE of Human
  • ISA X ISA Y if X is a subset of the more general
    concept Y.
  • Example sparrow ISA bird
  • HASPART X HASPART Y if the concept Y is a part
    of the concept X.
  • Or this can be any other property
  • Example sparrow HASPART tail

15
(No Transcript)
16
ISA hierarchy
  • The ISA (is a) or AKO (a kind of) relation is
    often used to link a class and its superclass.
  • And sometimes an instance and its class.
  • Some links (e.g. has-part) are inherited along
    ISA paths.
  • The semantics of a semantic net can be relatively
    informal or very formal
  • often defined at the implementation level

17
Inference by association
  • Red (a robin) is related to Air Force One by
    association (as directed path originated from
    these two nodes join at nodes Wings and Fly)
  • Bob and George are not related (no paths
    originated from them join in this network

18
Frames A Semantic Network with properties
  • A frame represents an entity as a set of slots
    (attributes) and associated values.
  • act, look, etc. like objects in C
  • a more robust/compact version of a semantic
    network
  • Each slot may have constraints that describe
    legal values that the slot can take.
  • A frame can represent a specific entity, or a
    general concept.
  • Frames are implicitly associated with one another
    because the value of a slot can be another frame.

19
(No Transcript)
20
Semantic Networks
  • Rules are appropriate for some types of
    knowledge,
  • but do not easily map to others.
  • Semantic nets can easily represent inheritance
    and exceptions,
  • but are not well-suited for representing
    negation, disjunction, preferences, conditionals,
    and cause/effect relationships.
  • Frames allow arbitrary functions (demons) and
    typed inheritance.
  • Implementation is a bit more cumbersome.

21
Lexical Centrality
22
LexRank Centrality in Text Graphs
Vertices Units of text (sentences or documents)
Edges Pairwise similarity between text
23
LexRank Centrality in Text Graphs
Intuition LexRank score is propagated through
edges Central vertices are those that are
similar to other central vertices
24
LexRank Centrality in Text Graphs
Recurrence Relation
0.3
0.1
0.9
0.3
s
0.5
0.8
Can guarantee solution by allowing jump
probability d/N.
0.2
0.4
0.2
25
(No Transcript)
26
http//tangra.si.umich.edu/clair/lexrank/
27
NLP and network analysis
28
Part of speech tagging
Word sense disambiguation
Document indexing
Mihalcea et al 2004
Mihalcea et al 2004
Biemann 2006
Subjectivity analysis
Semantic class induction
Passage retrieval
relevance
inter-similarity
Q
Widdows and Dorow 2002
Pang and Lee 2004
Otterbacher,Erkan,Radev05
29
MavenRank Centrality in Speech Graphs
Vertices Speech transcripts from a given topic
Edges tf-idf cosine similarity (with threshold)
Hypothesis Key speakers will have speeches with
high centrality.
30
MavenRank Example
Speech Scores 1 0.13 2 0.13 3 0.10 4 0.19 5 0.10
6 0.14 7 0.08 8 0.13 Speaker Scores (mean speech
score) 1 0.12 2 0.15 3 0.12
Speaker 1 Speeches
3
2
4
Speaker 2 Speeches
1
5
6
8
7
Speaker 3 Speeches
31
(No Transcript)
32
GIN Gene Interaction Network
  • Motivation
  • Biomedical literature is growing rapidly.
    Manually curated databases cover small portion of
    the available information
  • Most protein interaction information is uncovered
    in biomedical articles
  • Approach
  • text mining and network analysis for
  • Automatic extraction of molecule interactions
  • Automatic article summarization
  • Interaction and citation networks
  • Inferring gene-disease associations

33
Feature Extraction from Dependency Trees
The results demonstrated that KaiC interacts
rhythmically with KaiA, KaiB, and SasA.
  • Path1 KaiC nsubj interacts obj SasA
  • Path2 KaiC nsubj interacts obj SasA
    conj_and KaiA
  • Path3 KaiC nsubj interacts obj - SasA
    conj_and KaiB
  • Path4 SasA conj_and KaiA
  • Path5 SasA conj_and KaiB
  • Path6 KaiA - prep_with - SasA conj_and KaiB

34
Inferring Genes Related to Prostate Cancer
  • Hypothesis
  • Genes that are interacting with many genes that
    are known to be related to prostate cancer are
    likely to be related to prostate cancer
  • Approach
  • Extract the interaction network of genes (seed
    genes) that are known to be related to prostate
    cancer automatically from the literature
  • Infer new genes related to prostate cancer from
    the network topology
  • Use eigenvalue centrality to rank gene-prostate
    cancer associations
  • Hypothesis restatement
  • Genes central in the constructed network are most
    probably related to prostate cancer.

35
Approach
  • Corpus
  • PMCOA (PubMed Central Open Access) full text
    articles
  • Articles in PMCOA split into sentences and
    sentences tagged with GeniaTagger
  • Compile seed list of genes known to be related to
    prostate cancer
  • 20 genes compiled from OMIM (Online Mendelian
    Inheritance in Man) Database
  • Extend seed gene list with synonyms from HGNC
    (HUGO Gene Nomenclature Committee) database.
  • Use the automatic interaction extraction pipeline
    to extract the interaction network of the seed
    genes and their neighbors (genes interacting with
    the seed genes).

36
Seed Genes
  • 20 genes that are reported in OMIM to be related
    to prostate cancer

37
Interactions of the seed genes(gene names
normalized to their HGNC symbols)
38
Sample Extracted Interaction Sentences
  • A study by Jin et al. 20 indicated that the
    association of Tax with hsMAD1, a mitotic spindle
    checkpoint (MSC) protein, led to the
    translocation of both MAD1 and MAD2 to the
    cytoplasm.
  • PTEN is transcriptionally regulated by
    transcription factors such as p53, Egr-1, NFκB
    and SMADs, while protein levels and activity are
    modulated by phosphorylation, oxidation,
    subcellular localisation, phospholipid binding
    and protein stability 29.
  • Interestingly, one of these, HPC1, is linked to
    RNASEL 10,11.
  • In response to DNA damage, the cell-cycle
    checkpoint kinase CHEK2 can be activated by ATM
    kinase to phosphorylate p53 and BRCA1, which are
    involved in cell-cycle control, apoptosis, and
    DNA repair 1,2.
  • The interactions of RAD51 with TP53, RPA and the
    BRC repeats of BRCA2 are relatively well
    understood (see Discussion).
  • The interaction of BRCA2 with HsRad51 is
    significantly more different to both RadA and
    RecA (Figure 2c).
  • Max interactor protein, MXI1 (gene L07648)
    competes for MAX thus negatively regulates MYC
    function and may play a role in insulin
    resistance.
  • Mad2 binds to Cdc20, an activator of the
    anaphase-promoting complex (APC), to inhibit APC
    activity and arrest cells in metaphase in
    response to checkpoint activation.

39
Inferred Genes (evaluation of top-20 scoring
genes)
  • 6 are seed genes 14 genes are inferred to be
    related to prostate cancer
  • (Check GeneGo Pathway database if no evidence
    there, check PubMed literature)
  • 9 genes marked as being related to prostate
    cancer by GeneGo Pathway Database
  • 1 gene Found evidence in PubMed that gene
    related to prostate cancer
  • 4 genes no evidence found

40
(No Transcript)
41
Other networks
  • Diabetes Type I
  • Diabetes Type II
  • Bipolar Disorder

42
Properties of lexical networks
43
Dependency network
44
Random network
45
Analyzing networks
  • Properties of networks
  • Clustering coefficient
  • Watts/Strogatz cc triangles/triples
  • Power law coefficient a
  • Diameter (longest shortest path)
  • Average shortest path (ASP)
  • Properties of nodes
  • Centrality degree, closeness, betweenness,
    eigenvector

46
Types of networks
  • Regular networks
  • Uniform degree distribution
  • Random networks
  • Memoryless
  • Poisson degree distribution
  • Characteristic value
  • Low clustering coefficient
  • Large asp
  • Small world networks
  • High transitivity
  • Presence of hubs (memory)
  • High clustering coefficient
  • (e.g., 1000 times higher than random)
  • Small ASP
  • Power law degree distribution
  • (typical value of a between 2 and 3)

47
Comparing the dependency graph to a random
(Poisson) graph
48
Properties of lexical networks
  • Entries in a thesaurusMotter et al. 2002
  • c/c0 260 (n30,000)
  • Co-occurrence networks Dorogovtsev and Mendes
    2001, Sole and Ferrer i Cancho 2001
  • c/c0 1,000 (n400,000)
  • Mental lexicon Vitevitch 2005
  • c/c0 278 (n19,340)

49
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com