Graphical Representations of Knowledge and Its Distribution

About This Presentation

Title:

Graphical Representations of Knowledge and Its Distribution

Description:

DLSI: Experiments with NSF-Movie Review Corpus. Vector Spaces. Dimensions. Non-stop Terms ... Movie Reviews. 239. 70,411. 3,557. All Documents. 282. 122,685. 13,514 ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 14

Provided by: cliffb8

Category:

more less

Transcript and Presenter's Notes

Title: Graphical Representations of Knowledge and Its Distribution

1
Graphical Representations of Knowledge and Its
Distribution
Cliff Behrens Information Analysis Applied
Research Telcordia Technologies,
Inc 973.829.5198 cliff_at_research.telcordia.com
Workshop on Statistical Inference, Computing and
Visualization for Graphs Stanford University,
August 1 - 2, 2003
2
Knowledge, Consensus and Information Sharing
Cultural Knowledge Derived from Consensus
Consensus ? Knowledge
Individual Knowledge
Information Sharing Among Individuals in a Single
COI
3
Schemer Knowledge Validation Services

Issues with CSCW technology
Focus of CSCW research on new tools, less on
motivating their use
Collaborative modeling building often lacks
scientific rigor and quality control
Schemer Web-based technology that derives
knowledge from consensus among Subject Matter
Experts
Knowledge-based collaboration reveals
distribution of domain expertise among panelists
Metrics for qualifying panelists and validating
the models they produce
validates saliency of domain to SMEs
estimates competency of SMEs
yields best answers based on responses of SMEs
weighted by their respective competencies
Generic service, but first tried on SIAM
influence networks

4
SIAM Influence Net Example
5
Mathematics of Consensus Analysis (Romney et al.
1986)

Formal model consists of a data matrix X
containing the responses Xik of SMEs 1..i..N on
items 1..k..M
from this matrix a symmetrical matrix M is
estimated and holds the empirical point estimates
Mij, the proportion of matching responses on all
items between SMEs i and j, corrected for
guessing (if appropriate), on off-diagonal
elements.
Obtain approximate solution yielding estimates of
the individual SME competencies (the Di) by
applying Maximum Likelihood Factor Analysis to
fit equation below and solve for the main
diagonal values
M DD'
relative magnitude of eigenvalues (?1 gt 3 ? ?2)
implies single factor solution
Di, are the loadings for SMEs on the first
factor
Di v1i ??1
Estimated competency values (Di ) and the
profile of responses for item k (Xik,l) used to
compute Bayesian a posteriori probabilities for
each possible answer. The formula for the
probability that an answer is best or correct
one follows
N
Pr(ltXikgt i1 Zkl) ? Di
(1-Di)/LXik,l (1-Di)(L-1)/L1-Xik,l
i 1

6
Schemer Knowledge Validation Services
7
Knowledge-Based Communications Interface
8
Latent Semantic Indexing (LSI) What is it?
9
LSI How Does It Work?

Analyze training collection of documents
throw-out stop words and mark-up
count frequencies of words in each document
Compute term ? document matrix
store word counts as entries in a matrix
apply appropriate weighting, e.g., log-entropy,
to entries
Compute LSI vector space
reduce term ? document matrix with Singular Value
Decomposition
Fold new documents into LSI vector space
document vector computed from weighted sum of its
term vectors
Compute vector for query (pseudo-document)
query vector computed from weighted sum of its
term vectors
Search vector space for semantically-close
term/document vectors
compute cosine of angle between query and other
vectors

10
Scalability Large Document Collections and
Polysemy
11
LSI Ongoing Work

Distributed LSI
Needed for LSI to scale to massive document
collections
Adopts divide and conquer approach
Sort documents by conceptual domain
recognizes documents created for different COIs
create more semantically homogeneous
subcollections
apply cluster analysis, e.g., bisecting K-means
Compute independent LSI vector spaces for each
subcollection
more parsimonious representations of concept
domains or contexts
Compute similarity measures between spaces
construct graphs from terms shared by two vector
spaces
compute similarity between these two graphs
Discover appropriate search vector spaces for a
query
cosine calculations (as before)
relevance feedback (as before)
query expansion
Visualizations to explore semantic context for a
query in different LSI vector spaces

12
DLSI Experiments with NSF-Movie Review Corpus

13
DLSI The Context of Term Meaning

Graph of semantic relationships between top five
terms retrieved for the query travel, center,
earth from the vector space containing only NSF
geology abstracts.
Graph of semantic relationships between top five
terms retrieved for the query travel, center,
earth from the vector space containing only
Ebert movie reviews.
Graph of semantic relationships between top five
terms retrieved for the query travel, center,
earth from the vector space containing all
documents.

Write a Comment

User Comments (0)

About PowerShow.com

Graphical Representations of Knowledge and Its Distribution - PowerPoint PPT Presentation

Graphical Representations of Knowledge and Its Distribution

DLSI: Experiments with NSF-Movie Review Corpus. Vector Spaces. Dimensions. Non-stop Terms ... Movie Reviews. 239. 70,411. 3,557. All Documents. 282. 122,685. 13,514 ... – PowerPoint PPT presentation