Title: Connecting Distributed People and Information on the Web
1Connecting Distributed People and Information on
the Web
- Jennifer Golbeck
- College of Information Studies
- Human-Computer Interaction Lab
- University of Maryland, College Park
2Information Access on the Web
- Find an mp3 of a song that was on the Billboard
Top Ten that features a cowbell.
The Cowbell Project - http//www.geekspeakweekly.c
om/cowbell/
3Finding Trusted Information
http//www.cowabduction.com/
4The Social Solution
- People are the sources of information
- Social relationships give us information about
people - Use relationships to understand the information
people produce.
5Current State
- 250-ish social networks
- 850,000,000 users
- Ning claims 185,000 networks
6My Research Questions
- How do users behave and relate to one another in
web-based social networks? - How do social connections, like trust, relate to
information? - How can we estimate relationships (like trust)
between people who do not know each other? - How can we use social networks to build
intelligent systems to improve information
access?
7Social Relationships and Information
- How Trust Relates to Similarity
8A Study
- People create information on the web
- An expression of their opinions and view of the
world - Focus on quantitative information (e.g. ratings)
- People express trust in social networks
- How does trust relate to the similarity of two
people
9The Idea
- We know trust correlates with overall similarity
(Ziegler and Golbeck, 2006) - Does trust capture more than just overall
agreement? - Two Part Analysis
- Controlled study to find profile similarity
measures that relate to trust - Verification through application in a live system
10Experimental Outline
- Phase 1 Rate Movies - Subjects rate movies on
the list - Ratings grouped as extreme (1,2,9,10) or far from
average (4 different) - Create profiles of hypothetical users
- Profile is a list of movies and the hypothetical
users ratings of them - Subjects rate how much they would trust the
person represented by the profile - Vary the profiles ratings in a controlled way
11Phase 1 Rating Movies
- Movies most subjects would have seen - (100
worldwide top grossing films of all time) - Cover a broad spectrum of genres -
- Top 10 rated movies from each genre as listed in
the Internet Movie Database (IMDB) Action,
Adventure, Animation, Family, Comedy, Crime,
Documentary, Drama, Fantasy, Film-Noir, Horror,
Independent, Musical, Mystery, Romance, Science
Fiction, Thriller, War, and Western. - Include bad movies -(IMDB 100 worst rated movies
with at least 1,000 ratings) - 283 total films
- Ratings on 1 (bad) to 10 (great) scale
12Generating Profiles
- Each profile contained exactly 10 movies, 4 from
an experimental category and 6 from its
complement - E.g. 4 movies with extreme ratings and 6 with
non-extreme ratings - Control for average difference, standard
deviation, etc. so we could see how differences
on specific categories of films affected trust
13Example Profile
14Subjects
- 59 subjects
- Age 20 to 52
- Education
- 6 high school, 11 bachelors, 23 masters, 11 PhD,
8 unreported - Movie Experience
- Watch 1-2 times per week on average
- Movie media (web, magazines, etc.) every week or
two
15Results
- Reconfirmed that trust strongly correlates with
overall similarity (?). - Agreement on extremes (??)
- Largest single difference (r)
- Subjects propensity to trust (?)
16Propensity to Trust (?)
17Validation
- Gather all pairs of FilmTrust users who have a
known trust relationship and share movies in
common - 322 total user pairs
- Develop a formula using the experimental
parameters to estimate trust - Compute accuracy by comparing computed trust
value with known value
18In FilmTrust
- Use weights (w1,w2, w3, w4, w?) (7,2,1,8,2)
19Experimental Conclusions
- Social trust relationships are stronger between
people who are similar in certain ways - First observed in controlled experiments
- Verified through application in a real system
20Applications
- Using social trust for improved information access
21Social Information Access
- Use social relationships (e.g. trust) for
- Aggregating Information
- Sorting and Ranking Information
- Filtering and Assessing the Quality of
Information - FilmTrust
22FilmTrust
- Use Trust for information access
- Recommender system
- Review ordering
- 1200 users
23Information Aggregation Using Trust
- Trust-based Recommender System
- Generates predictive movie ratings based on trust
- Weighted average of everyones ratings of the
film,where trust is the weight
24(No Transcript)
25Difference between known user rating and
recommended rating (measured in number of stars
difference)
Minimum difference between known user rating and
average rating
26(No Transcript)
27(No Transcript)
28Conclusions andFuture Directions
29Conclusions - Social Information Access
- Use understanding and analysis of social behavior
in web-based social networks to improve
information access - Shown a connection between social trust and
similarity - Shown how trust can be used for aggregating,
sorting, and filtering information
30Future Directions - General
- Improved understanding of behavior in web-based
social networks - How different types of social connections relate
to information - How to improve information access using new
social analyses
31Future Directions - Specific
- Ad hoc information and social networks for micro
news - E.g. I have evacuated for natural disaster
(earthquake, hurricane, flood). I want to know
whats going on at my house. - Distributed information (satellite photos,
ground, video, photos, blog entries, local news
reports, message board text) - Needs
- Provenance - is this information unique, or is it
all derived from the same source? - Trust - should I trust the source of this
information?
32Questions
- Jennifer Golbeck
- golbeck_at_cs.umd.edu
- http//trust.mindswap.org
33(No Transcript)
34Generating Profiles
- Pre-defined rating differences
- Subjects rated 54 total profiles
- Six categories
- Three ? values
- Three profiles in each ?-category combination
35The Provenance Challenge
- Researchers in many areas
- Storage systems
- Databases
- Grid computing
- Data mining
- A challenge provides a standard for comparing
approaches - Given a scientific workflow and nine challenge
queries - Represent all data that we consider relevant
about the history of each file - Answer as many queries as possible
36FilmTrust Results
- FilmTrust compared trust from the social network
with overall similarity (via collaborative
filtering algorithms) as a weight in recommender
systems. - Trust outperformed overall similarity in some
cases, suggesting that trust captures something
more than overall similarity does
37Ten Largest WBSNs
- MySpace 150,000,000
- ChinaRen Xiaonei 60,000,000
- Adult Friend Finder 26,000,000
- Bebo 25,000,000
- Friendster 21,000,000
- Cyworld 21,000,000
- Tickle 20,000,000
- Black Planet 18,000,000
- Hi5 14,000,000
- LiveJournal 12,000,000
38Example Queries
- Find everything that caused a given Graphic to be
as it is. - Find all invocations of procedure align_warp
using a twelfth order nonlinear 1365 parameter
that ran on a Monday. - Find all images where at least one of the input
files had an entry global maximum4095. - A user has annotated some images with a
key-value pair centerUChicago. Find the outputs
of align_warp where the inputs are annotated with
centerUChicago.
39Semantic Web Approach
- Ontology represents information about the
execution of services and the dependencies among
files - Logical inferences connect objects to their
ancestors - Role hierarchy separates direct lineage from
ancestry - Semantics of transitive roles imply connections
among files connected through ancestral
relationships - Additional reasoning with Semantic Web Rules
40Evaluation through Query Answering
- SPARQL, a W3C standard, is used to formulate
queries - We were easily able to answer all nine queries
for the challenge (one of only 3 teams from 15
entries) - Have already completed the second phase of the
challenge, importing data from other systems and
applying our techniques
41Definition
- A Web-based Social Network (WBSN) must meet these
criteria - Accessible over the web with a web browser
- Users must explicitly state their relationship
with other people qua stating a relationship - Must have explicit built-in support for users
making these connections. - Relationships must be visible and browsable
(back)
42Why the Difference?
- Ranges of disconnected members
- Dogster and HAMSTERster have lowest rates
- Ecademy
- FilmTrust
- Mobango and Worldshine
- As the non-social networking purpose of the
website becomes stronger, the number of
friendless and outsiders increases
(back)
43Using Web-BasedSocial Networks (WBSNs)
- If we are going to use social networks for
information access we must understand - How do users behave in social networks?
- How do social relationships relate to information?
44Daily Trends
45Implications
- The trust we have in people can inform how we
treat information provided by those people - This and other studies suggest trust will work
well for aggregating, filtering and sorting
information - Important when working on the web
46Outline
- Motivation
- Understanding Relationships in Web-based Social
Networks - Behavior
- Trust
- Using Social Relationships for Information Access
- Conclusions and Future Directions
47Understanding Social Behavior
- In Web-Based Social Networks
48Behavior and Dynamics
- Social networks are not static.
- Relationships constantly change, are formed, and
are dropped. - New people enter the network and others leave
- Do people behave the same way in social networks
on the Web?
49Questions
- How do these networks grow (and shrink)?
- How are relationships added (and removed)?
- What affects social disconnect?
- What affects centrality?
50Methodology
- 24 month study
- Automatically collected adjacency lists (everyone
and who they know), join dates, and last active
dates for all members. - December 2004
- December 2006
- For 7 networks, I collected adjacency lists every
day for 7 weeks. - Who joined or left
- What relationships were added or removed
51Network Growth
- People do not leave social networks
- On sites with a clear simple process, less than a
dozen members leave per day - In some networks, essentially no one has ever
left - Lots of people join social networks
- For ten networks we knew the date that every
member joined the network - Networks tend to show linear growth
- The slope can shift
- Usually occurs suddenly
- Explained by some event
52(No Transcript)
53Relationships
- Forming relationships is the basis for social
networking - Almost all networks are growing denser
- Relationships grow at approximately 1.7 - 2.7
times the rate of membership - There is a strong social disincentive to remove
relationships
54FilmTrust Network
55Friendless and the Outsiders
- Friendless have no social connections
- Outsiders have social connections but are
independent from the major connected component of
the network - Important because if we are using the social
network for information access, these people will
get little benefit.
56(No Transcript)
57Centrality
- Other than having lots of friends, what makes
people more central? - Average shortest path length as centrality
measure - Activity
- Consider join date, last active date, and length
of activity (last active date - join date) - Compute rank correlation with centrality
- Medium strength correlation (0.5) between
duration and centrality
58Conclusions
- Networks follow a linear growth pattern, where
the slope shifts in response to events - People rarely leave networks
- Networks grow denser, with relationships added
more frequently than members - People will delete relationships, but orders of
magnitude less frequently than they add them - Websites with more non-social features tend to
have more friendless and disconnected users - Users with longer periods of activity tend to be
more central to the network
59Example Profile
- Movies m1 through m10
- User ratings r1r10 for m1m10
- r1r4 are extreme (1,2,9, or 10)
- r5r10 are not extreme
- Profile ratings pi ri?i