Title: Graphical Representations of Knowledge and Its Distribution
1Graphical Representations of Knowledge and Its
Distribution
Cliff Behrens Information Analysis Applied
Research Telcordia Technologies,
Inc 973.829.5198 cliff_at_research.telcordia.com
Workshop on Statistical Inference, Computing and
Visualization for Graphs Stanford University,
August 1 - 2, 2003
2Knowledge, Consensus and Information Sharing
Cultural Knowledge Derived from Consensus
Consensus ? Knowledge
Individual Knowledge
Information Sharing Among Individuals in a Single
COI
3Schemer Knowledge Validation Services
- Issues with CSCW technology
- Focus of CSCW research on new tools, less on
motivating their use - Collaborative modeling building often lacks
scientific rigor and quality control - Schemer Web-based technology that derives
knowledge from consensus among Subject Matter
Experts - Knowledge-based collaboration reveals
distribution of domain expertise among panelists - Metrics for qualifying panelists and validating
the models they produce - validates saliency of domain to SMEs
- estimates competency of SMEs
- yields best answers based on responses of SMEs
weighted by their respective competencies - Generic service, but first tried on SIAM
influence networks
4SIAM Influence Net Example
5Mathematics of Consensus Analysis (Romney et al.
1986)
- Formal model consists of a data matrix X
containing the responses Xik of SMEs 1..i..N on
items 1..k..M - from this matrix a symmetrical matrix M is
estimated and holds the empirical point estimates
Mij, the proportion of matching responses on all
items between SMEs i and j, corrected for
guessing (if appropriate), on off-diagonal
elements. - Obtain approximate solution yielding estimates of
the individual SME competencies (the Di) by
applying Maximum Likelihood Factor Analysis to
fit equation below and solve for the main
diagonal values - M DD'
- relative magnitude of eigenvalues (?1 gt 3 ? ?2)
implies single factor solution - Di, are the loadings for SMEs on the first
factor - Di v1i ??1
- Estimated competency values (Di ) and the
profile of responses for item k (Xik,l) used to
compute Bayesian a posteriori probabilities for
each possible answer. The formula for the
probability that an answer is best or correct
one follows - N
- Pr(ltXikgt i1 Zkl) ? Di
(1-Di)/LXik,l (1-Di)(L-1)/L1-Xik,l
i 1
6Schemer Knowledge Validation Services
7Knowledge-Based Communications Interface
8Latent Semantic Indexing (LSI) What is it?
9LSI How Does It Work?
- Analyze training collection of documents
- throw-out stop words and mark-up
- count frequencies of words in each document
- Compute term ? document matrix
- store word counts as entries in a matrix
- apply appropriate weighting, e.g., log-entropy,
to entries - Compute LSI vector space
- reduce term ? document matrix with Singular Value
Decomposition - Fold new documents into LSI vector space
- document vector computed from weighted sum of its
term vectors - Compute vector for query (pseudo-document)
- query vector computed from weighted sum of its
term vectors - Search vector space for semantically-close
term/document vectors - compute cosine of angle between query and other
vectors
10Scalability Large Document Collections and
Polysemy
11LSI Ongoing Work
- Distributed LSI
- Needed for LSI to scale to massive document
collections - Adopts divide and conquer approach
- Sort documents by conceptual domain
- recognizes documents created for different COIs
- create more semantically homogeneous
subcollections - apply cluster analysis, e.g., bisecting K-means
- Compute independent LSI vector spaces for each
subcollection - more parsimonious representations of concept
domains or contexts - Compute similarity measures between spaces
- construct graphs from terms shared by two vector
spaces - compute similarity between these two graphs
- Discover appropriate search vector spaces for a
query - cosine calculations (as before)
- relevance feedback (as before)
- query expansion
- Visualizations to explore semantic context for a
query in different LSI vector spaces
12DLSI Experiments with NSF-Movie Review Corpus
13DLSI The Context of Term Meaning
Graph of semantic relationships between top five
terms retrieved for the query travel, center,
earth from the vector space containing only NSF
geology abstracts.
Graph of semantic relationships between top five
terms retrieved for the query travel, center,
earth from the vector space containing only
Ebert movie reviews.
Graph of semantic relationships between top five
terms retrieved for the query travel, center,
earth from the vector space containing all
documents.