Title: To trust or not, is hardly the question!
1Wikipedia
- To trust or not, is hardly the question!
2We're never so vulnerable than when we trust
someone but paradoxically, if we cannot trust,
neither can we find love or joy- Walter Anderson
Trust
Quality
Popularity
Reach
How much we can trust is the right question
3Agenda
- Review two articles
- Briefly summarize other publications
4(No Transcript)
5Content quality
- What are the hallmarks of consistently good
information? - Objectivity unbiased information
- Completeness self explanatory
- Pluralism not restricted to a particular
viewpoint - Define prepositions of trust
6Prepositions of trust
7UML Model for Wikipedia
8Macro-areas of analysis
- Six macro-areas Quality of user, user
distribution and leadership, stability,
controllability, quality of editing and
importance of an article. - Using the ten propositions, 50 sources of trust
evidence are identified.
9Logic conditions
- Necessary to control the meaning of each trust
factor in relationship to the others - IF stability is high AND (length is short OR edit
is low OR importance is low) THEN warning - IF leadership is high AND dictatorship is
high THEN warning - IF length is high AND importance is low THEN
warning
10Calculation of Trust
11Evaluation
- Featured articles vs. Standard articles
12Cluster Analysis
13(No Transcript)
14Models
- Basic
- The better the authors, the better the article
quality - PeerReview
- Assumption A contributor reviews the content
before modifying it, thereby approving the
content that he/she does not edit
15Models
- ProbReview
- Improved assumption A contributor may not review
the entire article before modifying it - The farther a word is from another that the
author has written, the lower the probability
that he/she has read it - In conflicts, the higher probability is
considered - Probability is modeled as a monotonically
decaying function of the distance between the
words - Naïve
- The longer the article is , the better its
quality - Used as a baseline for comparison
16Iterative computation
- Initialize all quality and authority values
equally - For each iteration
- Use authority values from previous iteration to
compute quality - Use quality values to compute authority
- Normalize all quality and authority values
- Repeat step 2 until convergence (alternatives
repeat until difference is very small or until
maximum iterations have been reached)
17Evaluation
- Use a set of articles on countries that have been
assigned quality labels by Wikipedias Editorial
team - Preprocessing
- Bot revisions were removed from the analysis.
- Consecutive edits by a user were removed and
final edit was used.
18Evalation metrics
- Normalized discounted cumulative gain at top k
(NDCG_at_k) - Suited for ranked articles that have multiple
levels of assessment - Spearmans rank correlation
- Relevant for comparing the agreement between two
rankings of the same set of objects
19Results
20Conclusions
- ProbReview works best with decay scheme 2 or 3.
- Article length seems to be correlated with
article quality - Adding this to Basic and PeerReview models showed
some improvement but ProbReview did not benefit
21(No Transcript)
22Summary
- Revision trust model may help address
- Article trust
- Fragment trust
- Author trust
- A dynamic Bayesian network is used to model the
evolution of article trust over revisions - Wikipedia featured articles, clean-up articles
and normal articles are used for evaluation
23Results
24(No Transcript)
25Summary
- Uses revision history as well as the reputation
of the contributing authors - Assigns trust to text
26(No Transcript)
27Summary
- Propose the use of a trust tab in Wikipedia
- Link-ratio Ratio between the number of citation
and the number of non-cited occurrences of the
encyclopedia term - Evaluation compare link ratio values for
featured, normal and clean-up articles
28(No Transcript)
29Summary
- Propose a content-driven reputation system for
authors - Authors gain reputation when their work is
preserved by subsequent authors and lose
reputation when edits are undone or quickly
rolled back - Evaluation Low-reputation authors have larger
than average probability of having poor quality
as judged by human observers and are undone by
later editors
30(No Transcript)
31Summary
- A different question What are the controversial
articles? - Uses edit and collaboration history
- Two Models Basic and Contributor Rank
- Contributor Rank model tries to differentiate
between disputes due to the article and those due
to the aggressiveness of the contributors, with
the former being the one that is to be measured - Evaluation Identification of labeled
controversial articles
32Conclusions
- Interesting area to work on
- Different angles to consider and different
questions too - Data is available easily and has lots of relevant
features - Wikipedia editorial team classified articles help
evaluation - Great scope for more work in this area
- I want to look at this from the health
perspective
33Thank You