Seminar Series Social Information Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Seminar Series Social Information Systems

Description:

Publishing: Mechanisms to make information available to users ... Publishing: By placing documents on a Web Server (and then search for incoming links) ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 33
Provided by: Papa65
Category:

less

Transcript and Presenter's Notes

Title: Seminar Series Social Information Systems


1
Seminar SeriesSocial Information Systems
Manos Papagelis Department of Computer Science,
University of Toronto papaggel_at_cs.toronto.edu
  • Toronto, Spring, 2007

2
Presentation Outline
  • Part I Exploiting Social Networks for Internet
    Search
  • Part II An Experimental Study of the Coloring
    Problem on Human Subject Networks

3
Exploiting Social Networks for Internet Search
Alan Mislove, Krishna Gummadi, and Peter
Druschel, HotNets 2006
  • Part I

4
Introduction
  • Social Networking (SN)
  • A new form of publishing and locating
    information
  • Objective
  • To understand whether these social links can be
    exploited by search engines to provide better
    results
  • Contributions
  • Comparison of the mechanisms in Web and online SN
    for
  • Publishing Mechanisms to make information
    available to users
  • Locating Mechanisms to find information
  • Results from an experiment in social
    network-based Web Search
  • Challenges and opportunities in using Social
    Networks for Internet Search

5
Web vs. SN (1/2)
  • Web
  • Publishing By placing documents on a Web Server
    (and then search for incoming links)
  • Locating Via Search engines (Exploiting the link
    graph)
  • Pros
  • Very Effective (incoming links are good
    indicators of importance)
  • Limitations
  • No fresh data
  • No personalized results
  • Unlinked pages are not indexed

6
Web vs. SN (2/2)
  • Social Networks
  • Publishing No explicit links between content
    (photos, videos, blogs) but implicit links
    between content through explicit links between
    users.
  • Locating
  • Navigation through the social network and
    browsing users content
  • Keyword based search for textual or tagged
    content
  • Through "Top-10" lists
  • Pros
  • Helps a user find timely, relevant information by
    browsing adjacent regions of the network of users
    with similar interests
  • Content is rated rapidly (by comments and
    feedback of a community)

7
Integration of Web Search and SN
  • Web and SN information is disjoint
  • No unified search tool that locates information
    across different systems

8
PeerSpective SN-based Web Search
  • Technology
  • Lucene text search engine and FreePastry P2P
    Overlay
  • Lightweight HTTP Proxy transparently indexes all
    visited URLs of user

9
Searching Process
  • A query is submitted by a user to Google
  • The proxy transparently forwards the query to
    both Google and the Proxies of Users in the
    network
  • Each proxy executes the query on the local index
  • Results are then collated and presented alongside
    Google results
  • Peerspective Ranking
  • Lucene Sc. Pagerank Scores from users who
    previously viewed the result

10
Search Results Example
11
Experiments
  • 10 grad. students share downloaded or viewed Web
    content
  • One month long experiments
  • 200.000 Distinct URLs
  • 25 were of type text/html or application/pdf (so
    the can be indexed)
  • Reports On
  • Limits of hyperlink-based search
  • Benefits of SN-based Search

12
Limits of hyperlink-based search
  • Report on fraction of visited URLs that are not
    indexed by Google
  • Too new page (blogs)
  • Deep Web
  • Dark Web (no links)
  • Results
  • About 1/3 of requests cannot be retrieved by
    Google
  • Peerspectives indices covers 30 of the
    requested URLs
  • 13.3 of URLs were contained in PeerSpective but
    not in Google's index

13
Random samples of URLs not in Google and
Potential Reason
14
Benefits of SN-based Search
  • Experiments on clicks on results on first page
  • For 1730 queries (1079 resulted in clicks)
  • Results
  • 86.5 of the clicked results were returned only
    by Google
  • 5.7 of the clicked results were returned by both
  • 7.7 of the clicked results were returned only by
    PeerSpective
  • Conclusions
  • This 7.7 is considered to be the gold standard
    of web search engineering
  • Inherent advantage of using social links in web
    search

15
Reasons for Clicks on Peerspective
  • Disambiguation
  • Community tend to share definitions or
    interpretation of popular terms (bus)
  • Ranking
  • SN information can bias the ranking algorithms
    to the interests of users (CoolStreaming)
  • Serendipity
  • Ample opportunity of finding interesting things
    without searching

16
Example of URLs found in Peerspective
17
Opportunities and Challenges
  • Privacy
  • Willingness of users to disclose information
  • Need for mechanisms to control information flow
    and anonymity
  • Membership and Clustering of SN
  • Users may participate in many networks
  • Need for searching with respect to the different
    clusters
  • Content rating and ranking
  • New approaches to ranking search results
  • System Architecture centralized or Distributed?

18
An Experimental Study of the Coloring Problem on
Human Subject NetworksMichael Kearns, Siddharth
Suri, Nick Montfort, SCIENCE, (313), Aug 2006
  • Part II

19
Experimental Study on Human Subject Networks
  • Theoretical work suggests that structural
    properties of naturally occurring networks are
    important in shaping behavior and dynamics
  • E.g. Hubs in networks are important in routing
    information
  • Empirical Structural Properties established by
    many disciplines
  • Small Diameter (the six degrees of separation)
  • Local clustering of connectivity
  • Heavy-tailed distribution of connectivity
    (Power-law distributions)
  • Empirical Studies of Networks
  • Limitation Networks are fixed and given (no
    alternatives)
  • Other approach Controlled laboratory study

20
Experiment
  • Experimental Scenario
  • Distributed problem-solving from local
    information
  • Experimental Setting
  • 38 human subjects (network vertices)
  • Each subject controls the color of a vertex in a
    network
  • Networks simple and more complex
  • Goal Select a different color from that of all
    neighbors
  • Problem Coloring problem
  • Information Available Variable (Low, Medium,
    High)

21
Graph Coloring Problem
  • Graph coloring
  • An assignment of "colors" to certain objects in
    a graph such that no two adjacent objects are
    assigned the same color
  • Graph Coloring Problem
  • Find the minimum number of colors for an
    arbitrary graph (NP-hard)
  • Chromatic number
  • The least number of colors needed to color the
    graph
  • Example
  • Vertex coloring
  • A 3-coloring suits this graph but fewer colors
    would result in adjacent vertices of the same
    color

22
Network Topologies
20-Chord Cycle
Simple Cycle
5-Chord Cycle
Pref. Att. v3
Leader Cycle
Pref. Att. v2
23
Information View
Low (Color of each Neighbor)
All (All network)
Medium (of Links of each Neighbor)
3
6
3
YOU
YOU
YOU
10
7
Overall Progress
Overall Progress
Overall Progress
24
Graph Properties and Experimental Results
25
1 Collective Performance
  • Subjects could indeed solve the coloring problem
    across a wide range of networks
  • 31/38 experiments ended in solution in less that
    300 seconds
  • 82 sec mean completion time
  • Collective Performance affected by network
    structure
  • Preferential Attachment harder than Cycle-based
    networks
  • Cycle-based networks
  • Monotonic relationship between solution time and
    average network distance (smaller distance
    leading to shorter solution times)
  • Addition of random chords Systematically reduces
    solution time

26
2 Human Performance VS Artificial Distributed
Heuristics
  • Heuristic considered
  • A vertex is randomly selected
  • If there are unused colors in the neighbor of
    this vertex then a color is selected randomly
    from the available ones
  • If there are not unused then a color is selected
    randomly
  • Comparison measure
  • Number of vertex color changes
  • Findings
  • Results exactly reversed lower average distance
    increases the difficulty for the heuristic
  • Preferential attachment networks easier for the
    heuristic

27
3 Effects on Varying the Locality of Information
View
  • Variable locality information provided to
    subjects
  • Low Their own and neighboring colors are visible
  • Medium Their own and neighboring colors are
    visible but providing information on connectivity
    of neighbors
  • High global coloring state at all times
  • Findings
  • Increased amount of information
  • Reduces solution times for cycle-based networks
  • Decreases solution times for preferential
    attachment networks
  • Rapid convergence to one of the two solutions in
    cycle-based networks

28
Information View Effect 1 Pref. Att. VS
Cycle-based Networks
29
Information View Effect 2 Cycle-based Solution
Convergence
Low Information View
High Information View
Population oscillates between approaches to the
two solutions
Rapid convergence to one of the Two possible
solutions
30
Individual Strategies
  • Choosing colors that result in the fewest local
    conflicts
  • Attempt to avoid conflicts with highly connected
    subjects
  • Signaling behavior of subjects
  • Introducing conflicts to avoid local minima

31
Questions?
32
Thanks!
Write a Comment
User Comments (0)
About PowerShow.com