Connecting Distributed People and Information on the Web - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Connecting Distributed People and Information on the Web

Description:

... Movie Database (IMDB): Action, Adventure, Animation, ... Include bad movies -(IMDB 100 worst rated movies with at least 1,000 ratings) 283 total films ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 60
Provided by: Jennifer87
Category:

less

Transcript and Presenter's Notes

Title: Connecting Distributed People and Information on the Web


1
Connecting Distributed People and Information on
the Web
  • Jennifer Golbeck
  • College of Information Studies
  • Human-Computer Interaction Lab
  • University of Maryland, College Park

2
Information Access on the Web
  • Find an mp3 of a song that was on the Billboard
    Top Ten that features a cowbell.

The Cowbell Project - http//www.geekspeakweekly.c
om/cowbell/
3
Finding Trusted Information
  • How many cows in Texas?

http//www.cowabduction.com/
4
The Social Solution
  • People are the sources of information
  • Social relationships give us information about
    people
  • Use relationships to understand the information
    people produce.

5
Current State
  • 250-ish social networks
  • 850,000,000 users
  • Ning claims 185,000 networks

6
My Research Questions
  • How do users behave and relate to one another in
    web-based social networks?
  • How do social connections, like trust, relate to
    information?
  • How can we estimate relationships (like trust)
    between people who do not know each other?
  • How can we use social networks to build
    intelligent systems to improve information
    access?

7
Social Relationships and Information
  • How Trust Relates to Similarity

8
A Study
  • People create information on the web
  • An expression of their opinions and view of the
    world
  • Focus on quantitative information (e.g. ratings)
  • People express trust in social networks
  • How does trust relate to the similarity of two
    people

9
The Idea
  • We know trust correlates with overall similarity
    (Ziegler and Golbeck, 2006)
  • Does trust capture more than just overall
    agreement?
  • Two Part Analysis
  • Controlled study to find profile similarity
    measures that relate to trust
  • Verification through application in a live system

10
Experimental Outline
  • Phase 1 Rate Movies - Subjects rate movies on
    the list
  • Ratings grouped as extreme (1,2,9,10) or far from
    average (4 different)
  • Create profiles of hypothetical users
  • Profile is a list of movies and the hypothetical
    users ratings of them
  • Subjects rate how much they would trust the
    person represented by the profile
  • Vary the profiles ratings in a controlled way

11
Phase 1 Rating Movies
  • Movies most subjects would have seen - (100
    worldwide top grossing films of all time)
  • Cover a broad spectrum of genres -
  • Top 10 rated movies from each genre as listed in
    the Internet Movie Database (IMDB) Action,
    Adventure, Animation, Family, Comedy, Crime,
    Documentary, Drama, Fantasy, Film-Noir, Horror,
    Independent, Musical, Mystery, Romance, Science
    Fiction, Thriller, War, and Western.
  • Include bad movies -(IMDB 100 worst rated movies
    with at least 1,000 ratings)
  • 283 total films
  • Ratings on 1 (bad) to 10 (great) scale

12
Generating Profiles
  • Each profile contained exactly 10 movies, 4 from
    an experimental category and 6 from its
    complement
  • E.g. 4 movies with extreme ratings and 6 with
    non-extreme ratings
  • Control for average difference, standard
    deviation, etc. so we could see how differences
    on specific categories of films affected trust

13
Example Profile
14
Subjects
  • 59 subjects
  • Age 20 to 52
  • Education
  • 6 high school, 11 bachelors, 23 masters, 11 PhD,
    8 unreported
  • Movie Experience
  • Watch 1-2 times per week on average
  • Movie media (web, magazines, etc.) every week or
    two

15
Results
  • Reconfirmed that trust strongly correlates with
    overall similarity (?).
  • Agreement on extremes (??)
  • Largest single difference (r)
  • Subjects propensity to trust (?)

16
Propensity to Trust (?)
17
Validation
  • Gather all pairs of FilmTrust users who have a
    known trust relationship and share movies in
    common
  • 322 total user pairs
  • Develop a formula using the experimental
    parameters to estimate trust
  • Compute accuracy by comparing computed trust
    value with known value

18
In FilmTrust
  • Use weights (w1,w2, w3, w4, w?) (7,2,1,8,2)

19
Experimental Conclusions
  • Social trust relationships are stronger between
    people who are similar in certain ways
  • First observed in controlled experiments
  • Verified through application in a real system

20
Applications
  • Using social trust for improved information access

21
Social Information Access
  • Use social relationships (e.g. trust) for
  • Aggregating Information
  • Sorting and Ranking Information
  • Filtering and Assessing the Quality of
    Information
  • FilmTrust

22
FilmTrust
  • Use Trust for information access
  • Recommender system
  • Review ordering
  • 1200 users

23
Information Aggregation Using Trust
  • Trust-based Recommender System
  • Generates predictive movie ratings based on trust
  • Weighted average of everyones ratings of the
    film,where trust is the weight

24
(No Transcript)
25
Difference between known user rating and
recommended rating (measured in number of stars
difference)
Minimum difference between known user rating and
average rating
26
(No Transcript)
27
(No Transcript)
28
Conclusions andFuture Directions
29
Conclusions - Social Information Access
  • Use understanding and analysis of social behavior
    in web-based social networks to improve
    information access
  • Shown a connection between social trust and
    similarity
  • Shown how trust can be used for aggregating,
    sorting, and filtering information

30
Future Directions - General
  • Improved understanding of behavior in web-based
    social networks
  • How different types of social connections relate
    to information
  • How to improve information access using new
    social analyses

31
Future Directions - Specific
  • Ad hoc information and social networks for micro
    news
  • E.g. I have evacuated for natural disaster
    (earthquake, hurricane, flood). I want to know
    whats going on at my house.
  • Distributed information (satellite photos,
    ground, video, photos, blog entries, local news
    reports, message board text)
  • Needs
  • Provenance - is this information unique, or is it
    all derived from the same source?
  • Trust - should I trust the source of this
    information?

32
Questions
  • Jennifer Golbeck
  • golbeck_at_cs.umd.edu
  • http//trust.mindswap.org

33
(No Transcript)
34
Generating Profiles
  • Pre-defined rating differences
  • Subjects rated 54 total profiles
  • Six categories
  • Three ? values
  • Three profiles in each ?-category combination

35
The Provenance Challenge
  • Researchers in many areas
  • Storage systems
  • Databases
  • Grid computing
  • Data mining
  • A challenge provides a standard for comparing
    approaches
  • Given a scientific workflow and nine challenge
    queries
  • Represent all data that we consider relevant
    about the history of each file
  • Answer as many queries as possible

36
FilmTrust Results
  • FilmTrust compared trust from the social network
    with overall similarity (via collaborative
    filtering algorithms) as a weight in recommender
    systems.
  • Trust outperformed overall similarity in some
    cases, suggesting that trust captures something
    more than overall similarity does

37
Ten Largest WBSNs
  • MySpace 150,000,000
  • ChinaRen Xiaonei 60,000,000
  • Adult Friend Finder 26,000,000
  • Bebo 25,000,000
  • Friendster 21,000,000
  • Cyworld 21,000,000
  • Tickle 20,000,000
  • Black Planet 18,000,000
  • Hi5 14,000,000
  • LiveJournal 12,000,000

38
Example Queries
  • Find everything that caused a given Graphic to be
    as it is.
  • Find all invocations of procedure align_warp
    using a twelfth order nonlinear 1365 parameter
    that ran on a Monday.
  • Find all images where at least one of the input
    files had an entry global maximum4095.
  • A user has annotated some images with a
    key-value pair centerUChicago. Find the outputs
    of align_warp where the inputs are annotated with
    centerUChicago.

39
Semantic Web Approach
  • Ontology represents information about the
    execution of services and the dependencies among
    files
  • Logical inferences connect objects to their
    ancestors
  • Role hierarchy separates direct lineage from
    ancestry
  • Semantics of transitive roles imply connections
    among files connected through ancestral
    relationships
  • Additional reasoning with Semantic Web Rules

40
Evaluation through Query Answering
  • SPARQL, a W3C standard, is used to formulate
    queries
  • We were easily able to answer all nine queries
    for the challenge (one of only 3 teams from 15
    entries)
  • Have already completed the second phase of the
    challenge, importing data from other systems and
    applying our techniques

41
Definition
  • A Web-based Social Network (WBSN) must meet these
    criteria
  • Accessible over the web with a web browser
  • Users must explicitly state their relationship
    with other people qua stating a relationship
  • Must have explicit built-in support for users
    making these connections.
  • Relationships must be visible and browsable

(back)
42
Why the Difference?
  • Ranges of disconnected members
  • Dogster and HAMSTERster have lowest rates
  • Ecademy
  • FilmTrust
  • Mobango and Worldshine
  • As the non-social networking purpose of the
    website becomes stronger, the number of
    friendless and outsiders increases

(back)
43
Using Web-BasedSocial Networks (WBSNs)
  • If we are going to use social networks for
    information access we must understand
  • How do users behave in social networks?
  • How do social relationships relate to information?

44
Daily Trends
45
Implications
  • The trust we have in people can inform how we
    treat information provided by those people
  • This and other studies suggest trust will work
    well for aggregating, filtering and sorting
    information
  • Important when working on the web

46
Outline
  • Motivation
  • Understanding Relationships in Web-based Social
    Networks
  • Behavior
  • Trust
  • Using Social Relationships for Information Access
  • Conclusions and Future Directions

47
Understanding Social Behavior
  • In Web-Based Social Networks

48
Behavior and Dynamics
  • Social networks are not static.
  • Relationships constantly change, are formed, and
    are dropped.
  • New people enter the network and others leave
  • Do people behave the same way in social networks
    on the Web?

49
Questions
  • How do these networks grow (and shrink)?
  • How are relationships added (and removed)?
  • What affects social disconnect?
  • What affects centrality?

50
Methodology
  • 24 month study
  • Automatically collected adjacency lists (everyone
    and who they know), join dates, and last active
    dates for all members.
  • December 2004
  • December 2006
  • For 7 networks, I collected adjacency lists every
    day for 7 weeks.
  • Who joined or left
  • What relationships were added or removed

51
Network Growth
  • People do not leave social networks
  • On sites with a clear simple process, less than a
    dozen members leave per day
  • In some networks, essentially no one has ever
    left
  • Lots of people join social networks
  • For ten networks we knew the date that every
    member joined the network
  • Networks tend to show linear growth
  • The slope can shift
  • Usually occurs suddenly
  • Explained by some event

52
(No Transcript)
53
Relationships
  • Forming relationships is the basis for social
    networking
  • Almost all networks are growing denser
  • Relationships grow at approximately 1.7 - 2.7
    times the rate of membership
  • There is a strong social disincentive to remove
    relationships

54
FilmTrust Network
55
Friendless and the Outsiders
  • Friendless have no social connections
  • Outsiders have social connections but are
    independent from the major connected component of
    the network
  • Important because if we are using the social
    network for information access, these people will
    get little benefit.

56
(No Transcript)
57
Centrality
  • Other than having lots of friends, what makes
    people more central?
  • Average shortest path length as centrality
    measure
  • Activity
  • Consider join date, last active date, and length
    of activity (last active date - join date)
  • Compute rank correlation with centrality
  • Medium strength correlation (0.5) between
    duration and centrality

58
Conclusions
  • Networks follow a linear growth pattern, where
    the slope shifts in response to events
  • People rarely leave networks
  • Networks grow denser, with relationships added
    more frequently than members
  • People will delete relationships, but orders of
    magnitude less frequently than they add them
  • Websites with more non-social features tend to
    have more friendless and disconnected users
  • Users with longer periods of activity tend to be
    more central to the network

59
Example Profile
  • Movies m1 through m10
  • User ratings r1r10 for m1m10
  • r1r4 are extreme (1,2,9, or 10)
  • r5r10 are not extreme
  • Profile ratings pi ri?i
Write a Comment
User Comments (0)
About PowerShow.com