Title: First International Conference on
1Structural Link Analysis from User Profiles and
Friends NetworksA Feature Construction Approach
- William H. Hsu, Joseph Lancaster, Martin S. R.
Paradesi, Tim Weninger - Monday, 26 March 2007
- Laboratory for Knowledge Discovery in Databases
- Kansas State University
- http//www.kddresearch.org/KSU/CIS/ICWSM-20070326.
ppt
2Link Analysis in Social NetworksThe K-State
Corpus
3Outline
- Background, Related Work and Rationale
- Technical Objective Link Mining in Social
Networks - Methodology Graph Feature Extraction
- Experimental Results K-State LJMiner Corpus
- Continuing Work Statistical Relational Models
4Problem StatementLink Mining in Social Networks
- Problem Definition
- Given records of users of weblog or social
network service - Discover
- Features of entities users, communities
- Relationships friendship, membership,
moderatorship - Explanations and predictions for relationships
- Goals
- Boost precision and recall of link existence
prediction - Find relevant features
- Significance Recommendations (Friendship,
Membership)
5Related WorkLink Mining
- Getoor and Diehl (2005) - Graphical model
representations of link structure - Ketkar et al. (2005) - Data mining techniques vs
graph-based representation - Sarkar Moore (2005) - Change in link structure
across discrete time steps - Popescul Ungar (2003) - ER model to predict
links - Hill (2003), Bhattacharya Getoor (2004)
Statistical Relational Learning to resolve
identity uncertainty - Resig et al. (2004) - Predicting IM online times
using friends graph degree - McCallum et al. (2005) - Inferring roles and
topic categories based on link analysis
6Rationale
- Limitations of Current State of the Art
- Do not take graph features into account
- Limited ability to select, extract features
- Novel Contribution Link Mining System
- Extracts, computes features of network model
- Towards dependent types for relational link
mining - Rationale
- Desired functionality infer new links from old
- Evaluation precision, recall for link existence
7Outline
- Background, Related Work and Rationale
- Technical Objective Link Mining in Social
Networks - Methodology Graph Feature Extraction
- Experimental Results K-State LJMiner Corpus
- Continuing Work Statistical Relational Models
8K-State Test BedLJMiner Corpus
9LiveJournal Topology 1Tools and Security Model
10LiveJournal Topology 2Definitions
11Outline
- Background, Related Work and Rationale
- Technical Objective Link Mining in Social
Networks - Methodology Graph Feature Extraction
- Experimental Results K-State LJMiner Corpus
- Continuing Work Statistical Relational Models
12Graph Features 1Node, Pair, Link-Dependent
Node-Dependent Features specific to one node
(vertex) within candidate pair
Indegree (v) Target popularity
Indegree (u) Source popularity
Outdegree (u) Source fertility
Outdegree (v) Target fertility
Pair-Dependent Features specific to one
candidate pair of nodes (vertices)
Link-Dependent Features specific to one link
(edge) in directed graph
13Graph Features 2Node and Pair Features in
LJMiner
14LJCrawler
- System Design
- Data acquisition client, injector, parser
- Ancillary issues
- Multi-threading
- Distribution
- Storage
- Analytical postprocessing LJClipper, LJStats
- Distinguishing features of LJCrawler
- Results
- 200 users/second maximum, 5 users/second allowed
- Approximately 2 million pages crawled
15Outline
- Background, Related Work and Rationale
- Technical Objective Link Mining in Social
Networks - Methodology Graph Feature Extraction
- Experimental Results K-State LJMiner Corpus
- Continuing Work Statistical Relational Models
16Network StatisticsGraph Distance
17Interpretation of Results
- 941-node graph (Hsu et al., 2006) LJCrawler v1
output - 1000-4000 node graphs LJCrawler v2 output
18Outline
- Background, Related Work and Rationale
- Technical Objective Link Mining in Social
Networks - Methodology Graph Feature Extraction
- Experimental Results K-State LJMiner Corpus
- Continuing Work Statistical Relational Models
19Results
- Establishing an Interdisciplinary Research
Initiative - K-State / KU / UNL collaboration
- Resources Linguistic Data Consortium
- NIST evaluations
- Involving End Users of Machine Translation
- Document users
- Machine learning, data mining, info extraction
researchers - Novel Applications
- Social networks and collaborative recommendation
- Gisting and beyond
20Continuing Work
- Information Extraction and Intelligent IR
- Learning models for IE ontologies
- Latent semantic analysis
- Machine Learning
- Natural language learning
- Time series learning and understanding
- Relational and first-order models
- Automated Reasoning
- Probabilistic
- Case-based and analogical
- Data Mining and Warehousing
- Grid Computing
21References
- Knight, K. Whats New in Statistical Machine
Translation. Invited Talk, International Joint
Conference on Artificial Intelligence
(IJCAI-2005), Edinburgh, UK, August, 2005. - Knight, K. Graehl, J. (2005). An Overview of
Probabilistic Tree Transducers for Natural
Language Processing. In Proceedings of CICLing
2005, p. 1-24. - Chiang, D. A hierarchical phrase-based model for
statistical machine translation. In Proceedings
of the Conference of the Association for
Computational Linguistics (ACL 2005), p. 263270. - Koehn, P., Och, F. J., Marcu, D. (2003).
Statistical Phrase-Based Translation. In
Proceedings of HLT-NAACL 2003, the Human Language
Technology Conference of the North American
Chapter of the Association for Computational
Linguistics, May 27 - June 1, 2003, Edmonton,
CANADA.
22Acknowledgements
- K-State Lab for Knowledge Discovery in Databases
- Vikas Bahirwani
- Tejaswi Pydimarri
- Andrew King
- Social Networks, Graph Theory, Graph Algorithms
- Kirsten Hildrum (IBM T. J. Watson Labs)
- Todd Easton (K-State, Industrial and
Manufacturing Systems Engineering) - Machine Learning
- Dan Roth, Cinda Heeren, Jiawei Han (University of
Illinois at Urbana-Champaign) - AnHai Doan (University of Wisconsin Madison)
23Questions and Discussion