Graphical Representations of Knowledge and Its Distribution - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Graphical Representations of Knowledge and Its Distribution

Description:

DLSI: Experiments with NSF-Movie Review Corpus. Vector Spaces. Dimensions. Non-stop Terms ... Movie Reviews. 239. 70,411. 3,557. All Documents. 282. 122,685. 13,514 ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 14
Provided by: cliffb8
Category:

less

Transcript and Presenter's Notes

Title: Graphical Representations of Knowledge and Its Distribution


1
Graphical Representations of Knowledge and Its
Distribution
Cliff Behrens Information Analysis Applied
Research Telcordia Technologies,
Inc 973.829.5198 cliff_at_research.telcordia.com
Workshop on Statistical Inference, Computing and
Visualization for Graphs Stanford University,
August 1 - 2, 2003
2
Knowledge, Consensus and Information Sharing
Cultural Knowledge Derived from Consensus
Consensus ? Knowledge
Individual Knowledge
Information Sharing Among Individuals in a Single
COI
3
Schemer Knowledge Validation Services
  • Issues with CSCW technology
  • Focus of CSCW research on new tools, less on
    motivating their use
  • Collaborative modeling building often lacks
    scientific rigor and quality control
  • Schemer Web-based technology that derives
    knowledge from consensus among Subject Matter
    Experts
  • Knowledge-based collaboration reveals
    distribution of domain expertise among panelists
  • Metrics for qualifying panelists and validating
    the models they produce
  • validates saliency of domain to SMEs
  • estimates competency of SMEs
  • yields best answers based on responses of SMEs
    weighted by their respective competencies
  • Generic service, but first tried on SIAM
    influence networks

4
SIAM Influence Net Example
5
Mathematics of Consensus Analysis (Romney et al.
1986)
  • Formal model consists of a data matrix X
    containing the responses Xik of SMEs 1..i..N on
    items 1..k..M
  • from this matrix a symmetrical matrix M is
    estimated and holds the empirical point estimates
    Mij, the proportion of matching responses on all
    items between SMEs i and j, corrected for
    guessing (if appropriate), on off-diagonal
    elements.
  • Obtain approximate solution yielding estimates of
    the individual SME competencies (the Di) by
    applying Maximum Likelihood Factor Analysis to
    fit equation below and solve for the main
    diagonal values
  • M DD'
  • relative magnitude of eigenvalues (?1 gt 3 ? ?2)
    implies single factor solution
  • Di, are the loadings for SMEs on the first
    factor
  • Di v1i ??1
  • Estimated competency values (Di ) and the
    profile of responses for item k (Xik,l) used to
    compute Bayesian a posteriori probabilities for
    each possible answer. The formula for the
    probability that an answer is best or correct
    one follows
  • N
  • Pr(ltXikgt i1 Zkl) ? Di
    (1-Di)/LXik,l (1-Di)(L-1)/L1-Xik,l
    i 1

6
Schemer Knowledge Validation Services
7
Knowledge-Based Communications Interface
8
Latent Semantic Indexing (LSI) What is it?
9
LSI How Does It Work?
  • Analyze training collection of documents
  • throw-out stop words and mark-up
  • count frequencies of words in each document
  • Compute term ? document matrix
  • store word counts as entries in a matrix
  • apply appropriate weighting, e.g., log-entropy,
    to entries
  • Compute LSI vector space
  • reduce term ? document matrix with Singular Value
    Decomposition
  • Fold new documents into LSI vector space
  • document vector computed from weighted sum of its
    term vectors
  • Compute vector for query (pseudo-document)
  • query vector computed from weighted sum of its
    term vectors
  • Search vector space for semantically-close
    term/document vectors
  • compute cosine of angle between query and other
    vectors

10
Scalability Large Document Collections and
Polysemy
11
LSI Ongoing Work
  • Distributed LSI
  • Needed for LSI to scale to massive document
    collections
  • Adopts divide and conquer approach
  • Sort documents by conceptual domain
  • recognizes documents created for different COIs
  • create more semantically homogeneous
    subcollections
  • apply cluster analysis, e.g., bisecting K-means
  • Compute independent LSI vector spaces for each
    subcollection
  • more parsimonious representations of concept
    domains or contexts
  • Compute similarity measures between spaces
  • construct graphs from terms shared by two vector
    spaces
  • compute similarity between these two graphs
  • Discover appropriate search vector spaces for a
    query
  • cosine calculations (as before)
  • relevance feedback (as before)
  • query expansion
  • Visualizations to explore semantic context for a
    query in different LSI vector spaces

12
DLSI Experiments with NSF-Movie Review Corpus
 
 
       

13
DLSI The Context of Term Meaning
 
 
Graph of semantic relationships between top five
terms retrieved for the query travel, center,
earth from the vector space containing only NSF
geology abstracts.
Graph of semantic relationships between top five
terms retrieved for the query travel, center,
earth from the vector space containing only
Ebert movie reviews.
Graph of semantic relationships between top five
terms retrieved for the query travel, center,
earth from the vector space containing all
documents.
Write a Comment
User Comments (0)
About PowerShow.com