Lada Adamic, HP Labs, Palo Alto, CA - PowerPoint PPT Presentation

1 / 63

About This Presentation

Title:

Lada Adamic, HP Labs, Palo Alto, CA

Description:

Using GraphViz (by AT&T) layouts. Simple algorithm. If single, explicit link exists, draw it ... Friendster, Orkut, MySpace. LinkedIn, Spoke, VisiblePath ... – PowerPoint PPT presentation

Number of Views:114

Avg rating:3.0/5.0

Slides: 64

Provided by: LADA2

Learn more at: http://vw.indiana.edu

Category:

more less

Transcript and Presenter's Notes

Title: Lada Adamic, HP Labs, Palo Alto, CA

1
Information dynamics in the networked world
Lada Adamic, HP Labs, Palo Alto, CA
2
Talk outline
Information flow through blogs
Information flow through email
Search through email networks
Search within the enterprise
Search in an online community
3
Implicit Structure and Dynamics of
BlogSpaceEytan Adar, Li Zhang, Lada Adamic,
Rajan Lukose

Blog use
Record real-world and virtual experiences
Note and discuss things seen on the net
Blog structure blog-to-blog linking
Use Structure
Great to track memes (catchy ideas)

4
Approaches and uses of blog analysis

Patterns of information flow
How does the popularity of a topic evolve over
time?
Who is getting information from whom?
Ranking algorithms that take advantage of
transmission patterns

5
Tracking popularity over time
Popularity
Time
Blogdex, BlogPulse, etc. track the most popular
links/phrases of the day
6
Different kinds of information have
differentpopularity profiles
1
Major-news site (editorial content) back of the
paper
Products, etc.
Slashdotpostings
Front-pagenews
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5
10
15
5
10
15
5
10
15
5
10
15
of hits received on each day since first
appearance
7
Micro example Giant Microbes
8
Microscale Dynamics

What do we need track specific info epidemics?
Timings
Underlying network

b1
t0
Time of infection
t1
9
Microscale Dynamics

Challenges
Root may be unknown
Multiple possible paths
Uncrawled space, alternate media (email, voice)
No links

bn
b1
?
?
t0
Time of infection
t1
10
Microscale Dynamics who is getting info from whom

Explicit blog to blog links (easy)
Via links are even better
Implicit/Inferred transfer (harder)
Use ML algorithm for link inference problem
Support Vector Machine (SVM)
Logistic Regression
What we can use
Full text
Blogs in common
Links in common
History of infection

11
Visualization
http//www-idl.hpl.hp.com/blogstuff

Zoomgraph tool
Using GraphViz (by ATT) layouts
Simple algorithm
If single, explicit link exists, draw it
Otherwise use ML algorithm
Pick the most likely explicit link
Pick the most likely possible link
Tool lets you zoom around space, control
threshold, link types, etc.

12
Giant Microbes epidemic visualization
via link
inferred link
blog
explicit link
13
iRank

Find early sources of good information
using inferred information paths or timing

b1
True source
b2
Popular site
b3
b4

b5
bn
14
iRank Algorithm

Draw a weighted edge for all pairs of blogs that
cite the same URL
higher weight for mentions closer together
run PageRank
control for spam

t0
Time of infection
t1
15
Do Bloggers Kill Kittens?

0200 AM Friday Mar. 05, 2004 PST Wired
publishes
"Warning Blogs Can Be Infectious.
725 AM Friday Mar. 05, 2004 PST Slashdot posts
"Bloggers' Plagiarism Scientifically Proven"
955 AM Friday Mar. 05, 2004 PST Metafilter
announces
"A good amount of bloggers are outright thieves."

16
Information flow in social groups Fang Wu,
Bernardo Huberman, Lada Adamic, Joshua Tyler
17
Spread of disease is affected by the underlying
network
co-worker
mom
college friend
co-worker
mike
co-worker
18
Spread of computer viruses is affected by the
underlying network
co-worker
mom
college friend
co-worker
mike
co-worker
19
Difference between information flow and
disease/virus spread
Viruses (computer and otherwise) are
shared indiscriminately (involuntarily) Informati
on is passed selectively from one host to another
based on knowledge of the recipients interests
20
Spread of information is affected by its
content, potential recipients, and network
topology
co-worker
mom
college friend
co-worker
mike
co-worker
21
homophily individuals with like interests
associate with one another
personal homepages at Stanford
distance between personal homepages
22
The Model Decay in transmission probability as a
function of the distance m between potential
target and originating node
T(m) (m1)-b T
power-law implies slowest decay
23
Virus, information transmission on a scale free
network
P(k)
outdegree k
Degree distribution of all senders of email
passing through the HP email server
24
epidemics on scale free graphs
106 nodes, epidemic if 1 (104) infected
1
k

b

,
0
0.8
k
b
100,
0
k
b
100,
1
0.6
critical threshold
0.4
0.2
0
1
1.5
2
2.5
3
3.5
4
a
25
Study of the spread of URLs and attachments
40 participants (30 within HPL, 10 elsewhere in
HP other orgs) 6370 URLs and 3401 attachments
crypotgraphically hashed Question How many
recipients in our sample did each item reach?
caveats messages are deleted (still, the median
number of messages 2000) non-uniform sample
26
Only forwarded messages are counted
27
Results
average 1.1 for attachments, and 1.2 for URLs
ads at the bottom of hotmail yahoo messages
28
Simulate transmission on email log each message
has a probability p of transmitting information
from an infected individual to the recipient
02/19/2003 154533 I-1 I-2 02/19/2003 154533 I-
1 I-3 02/19/2003 154540 E-1 I-4 02/19/2003 1545
52 I-5 E-2 02/19/2003 154555 E-3 I-6 02/19/2003
154558 I-7 I-8 02/19/2003 154600 E-4 I-9 02/1
9/2003 154605 I-10 I-11 02/19/2003 154610 I-12
I-13 02/19/2003 154610 I-12 I-14 02/19/2003 15
4610 I-12 I-15 02/19/2003 154614 I-16 E-5
. .
. . . .
. .
internal node
external node
29
Simulation of information transmission on the
actual HP Labs email graph
an individual is infected if they receive a
particular piece of information individuals
remain infected for 24 hours start by infecting
one individual at random every time an infected
individual sends an email they have a probability
p of infecting the recipient track epidemic over
the course of a week, most run their course in
1-2 days
30
Introduce a decay in the transmission
probability based on the hierarchical distance
hAB 5
distance 2
distance 2
A
B
31
7119 potential recipients
p0
32
Conclusions on info flow in social groups
Information spread typically does not reach
epidemic proportions Information is passed on to
individuals with matching properties The
likelihood that properties match decreases with
distance from the source Model gives a finite
threshold Results are consistent with observed
URL attachment frequencies in a
sample Simulations following real email patterns
also consistent
33
How to search in a small world
Milgrams experiment Given a target individual
and a particular property, pass the message to a
person you correspond with who is closest to
the target.
34
Small world experiment at Columbia Dodds,
Muhamad, Watts, Science 301, (2003)

email experiement conducted in 2002 18 targets in
13 different countries 24,163 message chains
384 reached their targets average path length 4.0
35
Why study small world phenomena?
Curiosity Why is the world small? How are
people able to route messages? Social Networking
as a Business Friendster, Orkut,
MySpace LinkedIn, Spoke, VisiblePath
36
Six degrees of separation - to be
expected Pool and Kochen (1978) - average
person has 500-1500 acquaintances Ignoring
clustering, other redundancy 103 first
neighbors, 106 second neighbors, 109 third
neighbors But networks are clustered my
friends friends tend to be my friends Watts
Strogatz (1998) - a few random links in an
otherwise clustered graph give an average
shortest path close to that of a random graph
37
But how are people are able to find short paths?
How to choose among hundreds of
acquaintances? Strategy Simple greedy algorithm
- each participant chooses correspondent who is
closest to target with respect to the given
property Models geography Kleinberg
(2000) hierarchical groups Watts, Dodds, Newman
(2001), Kleinberg(2001) high degree
nodes Adamic, Puniyani, Lukose, Huberman (2001),
Newman(2003)
38
Spatial search
Kleinberg (2000)
The geographic movement of the message from
Nebraska to Massachusetts is striking. There is
a progressive closing in on the target area as
each new person is added to the chain S.Milgram
The small world problem, Psychology Today
1,61,1967
nodes are placed on a lattice and connect to
nearest neighbors additional links placed with
f(d) d(u,v)-r if r 2, can search in polylog
(
39
Kleinberg searching hierarchical
structures Small-World Phenomena and the
Dynamics of Information, NIPS 14, 2001
Hierarchical network models h is the distance
between two individuals in hierarchy with
branching b f(h) b-ah If a 1, can search
in O(log n) steps Group structure models q
size of smallest group that two individuals
belong to f(q) q-a If a 1, can achieve in
O(log n) steps
40
Identity and search in social networks Watts,
Dodds, Newman (2001)
individuals belong to hierarchically nested
groups
multiple independent hierarchies coexist pij
exp(-a x)
41
Identity and search in social networks Watts,
Dodds, Newman (2001)
There is an attrition rate r Network is
searchable if a fraction q of messages reach
the target
N102400
N204800
N409600
42
High degree search
Adamic et al. Phys. Rev. E, 64 46135 (2001)
Mary
Bob
Who could introduce me to Richard Gere?
Jane
43
power-law graph
number of nodes found
94
6
2
44
Poisson graph
number of nodes found
93
45
Scaling of search time with size of graph Sharp
cutoff at kN1/a , 2nd degree neighbors
random walk
a
0.37 fit
degree sequence
a
0.24 fit
covertime for half the nodes
size of graph
46
Testing the models on social networks (w/
Eytan Adar)
Use a well defined network HP Labs email
correspondence over 3.5 months Edges are between
individuals who sent at least 6 email messages
each way Node properties specified degree geogra
phical location position in organizational
hierarchy Can greedy strategies work?
47
Strategy 1 High degree search
Degree distribution of all senders of email
passing through the HP email server
outdegree
48
Filtered network (6 messages sent each way)
Degree distribution no longer power-law, but
Poisson
450 users median degree 10 mean degree
13 average shortest path 3 High degree
search performance (poor) median steps
16 mean 40
49
Strategy 2 Geography
50
Communication across corporate geography
1U
1L
87 of the 4000 links are between individuals on
the same floor
3U
4U
2L
3L
2U
51
Cubicle distance vs. probability of being linked
52
Finding someone in a sea of cubicles
median 7 mean 12
53
Strategy 3 Organizational hierarchy
54
Email correspondence scrambled
55
Actual email correspondence
56
Example of search path
distance 2
distance 1
hierarchical distance 5 search path distance 4
57
Probability of linking vs. distance in hierarchy
in the searchable regime 0
58
Results
59
Group size vs. probability of linking
60
Group size and probability of linking
group size g
61
Search Conclusions
Individuals associate on different levels into
groups. Group structure facilitates
decentralized search using social ties. HP Labs
as a social network is searchable but not quite
optimal. searching using the organizational
hierarchy is faster than using physical
location A fraction of important individuals
are easily findable Humans may be much more
resourceful in executing search tasks making
use of weak ties using more sophisticated
strategies
62
PeopleFinder2 a search engine for HP people
Extract disambiguate names from publicly
available documents Enrich information available
about individuals Search for them by
topic Identify knowledge communities from
co-occurrence of names
Live Demo
If live demo fails Current PeopleFinder
functionality PeopleFinder2 info on a
person Extracted topics for a person Social
network Social network visualization Search
for individuals by topic Visualize knowledge
network Find social network paths to experts
63
To find out more (papers, slides, other research
in the group)
Information dynamics group (IDL) at HP
Labs http//www.hpl.hp.com/research/idl List
of publications http//www.hpl.hp.com/personal/Lad
a_Adamic/research.html

Write a Comment

User Comments (0)