Title: Distributed, RealTime Computation of Community Preferences
1Distributed, Real-Time Computation of Community
Preferences
- Thomas Lutkenhouse, Michael L. Nelson, Johan
Bollen
- Old Dominion UniversityComputer Science
DepartmentNorfolk, VA 23529 USA
- lutken,mln,jbollen_at_cs.odu.edu
- HT 2005 - Sixteenth ACM Conference on Hypertext
and Hypermedia
- 6.-9.Sept. 2005, Salzburg Austria
2Distributed, Real-Time Computation of Community
Preferences
3Outline
- Review of technologies
- buckets
- Hebbian learning
- previous results
- Experiment design
- Results
- Lessons learned
- Conclusions
4Non-evolution of DL Objects
. . .
5Buckets
- Premise repositories come and go, but the
objects should endure
- Began as part of NASA DL research
- focus on digital preservation
- implementation of the Smart Objects, Dumb
Archives (SODA) model for digital libraries
- CACM 2001, doi.acm.org/10.1145/374308.374342
- D-Lib, dx.doi.org/10.1045/february2001-nelson
6Smart Objects
- Responsibilities generally associated with the
repository are pushed down into the stored
object
- TC, maintenance, logging, pagination display,
etc
- Aggregate
- metadata
- data
- methods to operate on the metadata/data
- API examples
- http//www.cs.odu.edu/mln/teaching/cs595-f03/?met
hodgetMetadatatypeall
- http//www.cs.odu.edu/mln/teaching/cs595-f03/?met
hodlistMethods
- http//www.cs.odu.edu/mln/teaching/cs595-f03/?met
hodlistPreference
- (cheat) http//www.cs.odu.edu/mln/teaching/cs595-
f03/bucket/bucket.xml
7Internal Structure
jaga.cs.odu.edu/home/mln/public_html/teaching/cs6
95-f03 ls bucket/ CVS/ index.cgi jaga.cs.od
u.edu/home/mln/public_html/teaching/cs695-f03
ls bucket/ bucket.xml content/ CVS/ lib/ log
s/ methods/ jaga.cs.odu.edu/home/mln/public_htm
l/teaching/cs695-f03 ls bucket/content/
syllabus.txt week1readings.html
week5readings.html week10readings.html week
1week-01.ppt week6readings.html
week11readings.html week2readings.html
week7readings.html week12readings.html week
2week-02.ppt week8readings.html
week13readings.html week3assignment1.ppt
week9readings.html week14readings.html week
3readings.html week15readings.html week3wee
k-03.ppt jaga.cs.odu.edu/home/mln/public_html/te
aching/cs695-f03 ls bucket/lib
CVS/ EZXML.pm mime.e style.css
jaga.cs.odu.edu/home/mln/public_html/teaching/cs6
95-f03 ls bucket/logs/ access.log CVS/ mylog.
log jaga.cs.odu.edu/home/mln/public_html/teachin
g/cs695-f03 ls bucket/methods/
addElement.pl getElement.pl
listMethods.pl setPreference.pl
CVS/ get_log.pl
listPreference.pl deleteElement.pl getlog.pl
log.pl display.pl getMetadata.pl
setMetadata.pl jaga.cs.odu.edu/home/mln/public
_html/teaching/cs695-f03
8Examples
- 1.6.X bucket
- http//ntrs.nasa.gov/
- http//www.cs.odu.edu/mln/phd/
- 2.0 buckets
- http//www.cs.odu.edu/mln/teaching/cs595-f03/
- http//www.cs.odu.edu/lutken/bucket/
- 3.0 buckets (under development)
- http//beaufort.cs.odu.edu8080/
- uses MPEG-21 DIDLs
- cf. http//www.dlib.org/dlib/november03/bekaert/11
bekaert.html
9Hebbian Learning
Implementation issues - gather log files - p
roblematic when spread across servers/domains
- determine a ?T for session reconstruction
- typically 5 min - compute links weights
- update the network periodically
- typically monthly
10Previous, Log-Based Recommendation Implementations
- LANL Journal Recommendations
- collection analysis based on journal readership
patterns
- D-Lib Magazine, dx.doi.org/10.1045/june2002-bollen
- NASA Technical Report Server
- compared recommendations with those generated by
VSM
- WIDM 2004, doi.org.acm/1031453.1031480
- Open Video Project
- generated recommendations for videos (little
descriptive metadata)
- JCDL 2005, doi.acm.org/1065385.1065472
11Hebbian Learning with Bucket Methods
http//b?methoddisplay refererhttp//b redir
ecthttp//a?methoddisplay 26redirecthttp//c?
methoddisplay
26refererhttp//b
http//a?methoddisplay refererhttp//a redir
ecthttp//b?methoddisplay
26refererhttp//a
12Experiment
- Spin Magazines Top 50 Rock Bands of All Time
- something other than reports, journals, etc.
- harvest allmusic.com for metadata for all LPs by
the 50 bands (total 800 LPs)
- Maintain hierarchical arrangement
- 1 artist ? N albums
- Initialize the network of 800 LPs with each LP
randomly linked to 5 other LPs
- Send out email invitations to browse the network
- have them explore, and then examine the resulting
network
- users not informed about the workings of the
network
13Display of LPs
14Hierarchical, Weighted Links
- - w.cs.odu.edu/lutken/bucket/121/"
- - Â Terrapin S
tation, Capital Centre, Landover, MD,
3/15/90 Â Â ative / Â Â - ment wt"0.5" id"http//www.cs.odu.edu/lutken/b
ucket/11/" - - Â leJealousy/Progress Â
  Â
- .odu.edu/lutken/bucket/434/"
- -  Nevermind/title  Â
  - "0.5" id"http//www.cs.odu.edu/lutken/bucket/13
0/" - - Â Tech
nical Ecstasy   inistrative /  Â
.
weights - initial 0.5 - frequency 1.0
- symmetry 0.5
- transitivity 0.3
15Respondents
- August 2004 - October 2004
- 160 respondents
- self-identify at the beginning exit survey at
the end
- 1200 bucket-to-bucket traversals (7.5 average
traversals per session)
16How to Evaluate the Resulting Network?
- Compute network analysis metrics
- PageRank
- Degree Centrality
- Weighted Degree Centrality
- Compare the results to
- Other expert lists (VH1, DigitalDreamDoor,
original Spin Magazine list)
- Artist / LP best seller according to RIAA
- Artist / LP Amazon sales rank
17Expert Rankings
- No correlation with
- VH1 artist list
- DigitalDreamDoor list
- original Spin Magazine list (!)
- (critics dont agree with each other, or the
record buying public)
18RIAA Results
- RIAA had only
- only 51/800 LPs
- only 14/50 artists
- (critics dont buy records!)
RIAA sales caveat
Figure 6. Probability of albums being
best-sellers.
Figure 7. Probability of artists being
best-sellers.
19Amazon Sales Rank
- No correlation with individual LP sales rank
- but correlated with mean artist sales rank
- similar to RIAA data
- interpretation popular artists often have
obscure LPs
20Relatedness(?)
21Relatedness(?)
22Lessons Learned
- While the subject matter was interesting, it was
oriented for music geeks
- i.e., no actual music was delivered to the users
(intellectual property considerations)
- more traversals needed
- Random initial starting points were difficult to
overcome
- cold start problem - pre-seed the links
according to some criteria?
- weights did not decay over time/traversals
- Choosing only artists from Spin Magazine may have
pre-filtered the response
- choose artists from Down Beat (Jazz), Vibe
(Urban), Music City News (Country), etc.
23Conclusions
- Can build a network of smart objects featuring
adaptive, hierarchical links constructed in
real-time without central state
- network is created without latency and with
computations amortized over individual accesses
- Experimental testbed with popular music LP
metadata shown to approach sales rank of artists,
not LPs