Searching peoples' digital footprints A new avenue for social sciences - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Searching peoples' digital footprints A new avenue for social sciences

Description:

E-communities (Facebook, MySpace, etc) E-games (Roleplaying, Where is George, etc) ... Huge network: proxy for network at societal level. Largest connected ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 35
Provided by: ker143
Category:

less

Transcript and Presenter's Notes

Title: Searching peoples' digital footprints A new avenue for social sciences


1
Searching peoples' digital footprintsA new
avenue for social sciences
János Kertész Budapest University of Technology
and Economics
2
Outline
  • Problems with the classical primary data
    collection in soc.
  • an example
  • Abundance of data Digital footprints
  • - new era in social sciences
  • Examples
  • Mobile phone network
  • Topology and weights
  • Global structure
  • Spreading
  • Summary

3
Primary Methods of Data Collection Interviewing
People Designing a questionnaire Observing
people Content analysis Designing an experiment
to carry out Case study Focus group
4
Primary Methods of Data Collection Interviewing
People Designing a questionnaire This method is
best for discovering factual information about
people Observing people Content analysis
Designing an experiment to carry out Case
study Focus group
Statistics about primary data collection Papers
over 10 years in American Sociological
Review Interpretative 17 Survey
80 Experiment 3
5
AddHealth
M. Gonzales et al. Physica A 379, 54 (2007)
Data based on questionnaires and medical tests
1700 publications (inc. dissertations) We used
the data from Wave I (1994-95) 75871 students
were asked in 84 high schools 68 questions,
including 10 friendship related ones Name 5
best male and 5 best female friends. For each
friend select from the list those, which apply.
During the last 7 days you 1. visited each
other 2. met after school 3. spent time together
during last weekend 4. talked with him/her about
a problem 5. talked with him/her on the phone
6
Threshold analysis
Links are a priori directed, corresponding to the
nominations
Strength of ties characterized by discrete weights
Strong asymmetry may occur A B but B A
1
5
Questionnaires are efficient Multi-factor
analysis Specialized questions
Detailed picture
But - Samples of limited size -
Subjectivity
7
Ethnic preferences in friend selection
Colors and symbols assigned to races P frequency
of links in the friendship nw Pref frequency in
the reference nw
P(r,r) Pref(r,r) ?ref(r,r)
  • Homophily
  • Significant deviations

Z(r,r)
8
Other ways of finding data for scientific
research Huge datasets due to IT Official data
collections (open or can be made available)
Statistical Institutes (e.g. Data from the
Swedish Stat. Inst.) Fiscal data (income
distributions etc.) Medical Data (e.g., Finnish
diabetes data, mortality data etc.) Labor
related Commercial data (e.g. trading data of
companies) secret, property of companies Financial
data (e.g., stock and other markets, banks)
partly open (free or for purchase) Science
related (open) Human Genome Project Chemical
Data Banks Archives Bibliographies
These data are produced either for analysis or we
assume that they would be used for that purpose
9
Data generated in our everyday lives A new
avenue for social sciences Digital footprints
10
  • This collection of data raises
  • Legal
  • Ethical
  • issues (see later)

At the same time it provides a gold mine for
research!
11
(No Transcript)
12
Until now, social science has struggled to obtain
tools that do more than scratch the surface of
some of its questions. These range from
identifying the driving forces behind violence,
to the factors influencing how ideas, attitudes
and prejudices spread through human populations.
The available tools have largely remained in a
time warp, consisting of analyses of national
censuses, small-scale surveys, or lone
researchers with a notebook observing
interactions within small groups. Being able to
automatically and remotely obtain massive amounts
of continuous data opens up unprecedented
opportunities for social scientists to study
organizations and entire communities or
populations.
NATUREVol 44911 October 2007
13
  • Communications leave detailed information about
    who with whom, when and where
  • phone (mobile and fixed line)
  • sms, mms
  • MSN
  • email
  • In a broader sense all kinds of activities can be
    used, which leave electronic records, including
  • commercial activities (ebay, point collecting
    cards, credit cards, etc)
  • open collaborative environments (Wikipedia, gnu,
    etc)
  • E-communities (Facebook, MySpace, etc)
  • E-games (Roleplaying, Where is George, etc)

14
Where is George?
Zip code
15
D. Brockmann, L. Hufnagel and T. Geisel Nature
439, 462-465 (2006)
(Where is George) Scaling laws of human travel
16
eBay-network
goods
people
17
eBay data
I. Yang, E. Oh, B. Kahng Phy. Rev. E 74, 016121
2006
1)
2)
A collectibles, B clothing, sport, office C
home decoration, electronics, D art, hobby E
books, toys, F valuables (jewelry, stamps, )
Traditional classification scheme (2) can be
improved by hierarchical agglomeration algorithm
(1)
18
Enron Email Dataset (free www.cs.cmu.edi/enron/)
150 users, (Enron management) 0.5M messages made
public (including content!) by Fed. Energy
Regulatory Commission The presently available
corpus does not include attachments and some
messages have been deleted (due to requests of
affected employees)
Triggered much interesting work, e.g. Berkeley
Enron Email Analysis (testing methods) J.
Shetty and J. Adibi The Enron Email
Dataset Database Schema and Brief Statistical
Report Z. Eisler, I Bartos and J.K.
Fluctuation scaling
Huberman et al HP data (publicly not available)
19
J. Shetty and J. Adibi
20
Fluctuation scaling ? ?
Eisler et al. Advances in Physics, 57, 89 (2008)
21
Constructing social network from mobilephone
data J.-P. Onnela, et al. PNAS 104, 7332-7336
(2007) J.-P. Onnela, et al. New
J. Phys. 9, 179 (2007)
  • Over 7 million private mobile phone subscriptions
  • Focus voice calls within the home operator
  • Data aggregated from a period of 18 weeks
  • Require reciprocity (X?Y AND Y?X) for a link
  • Customers are anonymous (hash codes)
  • Data from an European mobile operator

22
Huge network proxy for network at societal level
Largest connected component dominates 3.9M / 4.6M
nodes 6.5M / 7.0M links
23
Granovetters Weak Ties Hypothesis
  • Granovetter suggests analysis of social
    networks as a tool for
    linking micro and
    macro levels of sociological theory
  • Considers the macro level implications of tie
    (micro level) strengths
  • The strength of a tie is a (probably linear)
    combination of the amount of time, the emotional
    intensity, the intimacy (mutual confiding), and
    the reciprocal services which characterize the
    tie.
  • Formulates a hypothesis
  • The relative overlap of two individuals
    friendship networks varies directly with the
    strength of their tie to one another
  • Explores the impact of the hypothesis on, e.g.
    diffusion of information, stressing the cohesive
    power of weak ties
  • M. Granovetter, The Strength of Weak Ties,
  • The American Journal of Sociology 78,
    1360-1380, 1973.

24
Overlap
  • Definition relative neighborhood overlap
    (topological)
  • where the number of triangles around edge (vi,
    vj) is nij
  • Illustration of the concept

25
Empirical Verification
  • Let w denote Oij averaged over a bin of
    w-values
  • Use cumulative link weight distribution
  • (the fraction of links with weights less than
    w)
  • Relative neighbourhood overlap increases as a
    function of link weight
  • ?Verifies Granovetters hypothesis (95)
  • (Exception Top 5 of weights)
  • Blue curve empirical network
  • Red curve weight randomised network

26
High Weight Links?
  • Weak links Strengh of both adjacent nodes (min
    max) considerably higher than link weight
  • Strong links Strength of both adjacent nodes
    (min max) about as high as the link weight
  • Indication High weight relationships clearly
    dominate on-air time of both, others negligible
  • Time ratio spent communicating with one other
    person converges to 1 at roughly w 104
  • Consequence Less time to interact with others
  • Explaining onset of decreasing trend for w

27
Our study revealed the structure of the network,
the interplay btw weigths and communities, the
relations btw local, mesoscopic and global
structure
Possible to ask unprecedented questions (and
even find the answers to them)
28
Thresholding
  • Initial connected network (f0)
  • ? All links are intact, i.e. the network is in
    its initial stage

29
Thresholding
  • Increasing weight thresholded network (f0.8)
  • ? 80 of the weakest links removed, strongest
    20 remain

30
Thresholding
  • Initial connected network (f0)
  • ? All links are intact, i.e. the network is in
    its initial stage

31
Thresholding
  • Decreasing weight thresholded network (f0.8)
  • ? 80 of the strongest links removed, weakest
    20 remain

32
Percolation aspects
Weights
Overlap
The local relationship between weights and
topology has global consequences
33
Diffusion of information
  • Knowledge of information diffusion based on
    unweighted networks
  • Use the present network to study diffusion on a
    weighted network Does the local
    relationship between topology and tie strength
    have an effect?
  • Spreading simulation infect one node with new
    information
  • (1) Empirical pij ? wij
  • (2) Reference pij ?
  • Spreading significantly faster on the reference
    (average weight) network
  • Information gets trapped in communities in the
    real network

Reference
Empirical
34
Summary Fantastic new possibilities for
computational social science Multidisciplinary
efforts needed. Examples Mobile phone network
(topology and weights, global structure,
spreading) More open, shared data needed.
Benchmarking. Experiments??? Artificial
data? Ethical and legal issues Privacy,
commercial interest and scientific
reproducibility Surveys cannot be substituted!
Write a Comment
User Comments (0)
About PowerShow.com