Small World Clustering Algorithms - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Small World Clustering Algorithms

Description:

Small World Clustering Algorithms. Brant Chee. Experiments. 3 clustering algorithms ... Clusters with less than 4 elements or more than 50 elements were ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 28
Provided by: brant
Category:

less

Transcript and Presenter's Notes

Title: Small World Clustering Algorithms


1
Small World Clustering Algorithms
  • Brant Chee

2
Experiments
  • 3 clustering algorithms
  • Complete Link (Cluto)
  • K means (Cluto)
  • Small World

3
Test Collections
4
Experimental Setup
  • Parameters left at package defaults
  • Clustered with n 50,100,150 and 200.
  • Clusters with less than 4 elements or more than
    50 elements were eliminated and the clustering
    which resulted in less than 40 clusters was
    chosen to be evaluated.

5
Quantitative Results
6
Quantitative Results II
7
Qualitative Evaluation
  • 2 Criteria Utility and Coherence
  • 3 point scale 1 good, 2 poor, 3 bad
  • Good gt60 of articles
  • Poor 59-41
  • Bad lt40
  • Evaluate terms in cluster to get context.

8
Quantitative Results Cont
9
Sample Session
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
Other Approaches
  • Statistical Methods

15
Other Clustering Approaches
  • Can we choose other types of clustering
    algorithms which could provide better quality
    results or provide better cluster labels?
  • SOM (Self Organizing Map)
  • Slow for high numbers of dimensions and large
    numbers of objects.
  • Carrot2
  • Slow for large numbers of items.
  • Huge memory consumption.

16
Random Projection
  • Can we reduce the dimensionality of vectors (ie
    50,000?1000) while preserving distances?
  • Speed up similarity calculations
  • Various methods
  • Random projection.
  • Latent semantic indexing.
  • Multi Dimensional Scaling

17
Very Sparse Random Projections
  • A ? R be our n points in D dimensions
  • A x Random matrix ? RDk
  • R of entries in -1, 0, 1 with probabilty
  • O(nDk n2k)

18
Reducing Dimensionality
  • Bank Dataset 11,000 articles from 11 categories
    in Dmoz.
  • 11,000 articles reduced from 30K terms 1GB heap
    in 11s.
  • Increase in Purity and decrease in Entropy
    (measures of clustering quality).

19
MI on Phrases
  • More context than single words
  • More meaningful term clusters

20
Other approaches
  • Knowledge Intensive Approaches

21
Hypernym
  • Is-a relationship
  • Shakespeare is an author.
  • Pug is a dog.
  • Implicitly hierarchical.
  • Basis of many ontology and semantic networks.
  • Wordnet
  • UMLS

22
Portion of the UMLS Semantic Network Biologic
Function
23
Hypernym Relations
  • NP such as , NP (or and) NP
  • Vegetables such as Beets, Carrots and Peas.
  • Such NP as NP, (orand) NP
  • works by such authors as Herrick, Goldsmith and
    Shakespeare.
  • NP , NP , orand other NP
  • Bruises, , broken bones or other injuries
  • NP , including NP, orand NP
  • All common-law countries, including Canada and
    England
  • NP , especially NP, orand NP
  • most European countries, especially France,
    England and Spain.

24
Uses of Hypernym Trees
  • Search
  • Query Expansion
  • Facted metadata
  • Clustering
  • Parent node defines a cluster
  • Keyword assignment

25
Trivial Hypernyms
  • organic compounds d-ribose
  • organic compounds d-arabinose
  • organic compounds l-arabinose
  • organic compounds sucrose
  • substances cortisone
  • substances vitamins a and c
  • substances zinc
  • organs liver
  • organs kidney
  • sugar-containing products honey
  • sugar-containing products jam
  • sugar-containing products glucose
  • sugar-containing products fruit juice
    concentrates
  • sugar-containing products tomato
  • largely populated countries china
  • largely populated countries russia

26
Bad Hypernyms
  • suicidal patients appears
  • other agents plasmin
  • other agents plasminogen
  • such common sensations illness
  • phenomena founder effects
  • phenomena migration
  • phenomena gene flow
  • clinical manifestations 80
  • chemical agents homocystine
  • no other explanation anencephaly
  • conditions azure a-0.5 nahco3 solution
  • conditions ph 8.1
  • fewer side-effects vegetative disfunction
  • techniques carpentier
  • techniques 's ring

27
Good? Hypernyms
  • entirely synthetic steroids norgestrel and
    quingestanol
  • menstrual disorders metrorrhagia
  • menstrual disorders oligoamenorrhea
  • menstrual disorders amenorrhea
  • mild venous disorders swollen veins
  • mild venous disorders heavy limbs
  • mild venous disorders varicosities
  • obstructive pulmonary lung diseases alveolar
    proteinosis
  • obstructive pulmonary lung diseases pneumonia
  • obstructive pulmonary lung diseases asthma
  • obstructive pulmonary lung diseases
    bronchiectasis
  • obstructive pulmonary lung diseases cystic
    fibrosis
  • choline analogues n,n'-dimethylethanolamine
  • choline analogues n-monomethylethanolamine
  • choline analogues ethanolamine
  • 3alpha-oh-containing steroids androsterone
Write a Comment
User Comments (0)
About PowerShow.com