Title: Data Mining: Potentials and Challenges
1Data MiningPotentials and Challenges
- Rakesh Agrawal Jeff Ullman
2Observations
- Transfer of data mining research into deployed
applications and commercial products - Greater success in vertical applications
- Horizontal tools Examples
- SAS Enterprise Miner Sophisticated Statisticians
segment - DB2 Intelligent Miner database applications
requiring mining - Emergence of the application of data mining in
non-conventional domains - Combination of structured and unstructured data
- New challenges due to security/privacy concerns
- DARPA initiative to fund data mining research
3Identifying Social Links Using Association Rules
Input Crawl of about 1 million pages
4Website Profiling using Classification
Input Example pages for each category during
training
5Discovering Trends Using Sequential Patterns
Shape Queries
Input i) patent database ii) shape of interest
6Discovering Micro-communities
Frequently co-cited pages are related. Pages
with large bibliographic overlap are related.
7New Challenges
- Privacy-preserving data mining
- Data mining over compartmentalized databases
8Inducing Classifiers over Privacy Preserved
Numeric Data
Alices age
Alices salary
Johns age
30 becomes 65 (3035)
9Other recent work
- Cryptographic approach to privacy-preserving data
mining - Lindell Pinkas, Crypto 2000
- Privacy-Preserving discovery of association rules
- Vaidya Clifton, KDD2002
- Evfimievski et. Al, KDD 2002
- Rizvi Haritsa, VLDB 2002
10Computation over Compartmentalized Databases
11Some Hard Problems
- Past may be a poor predictor of future
- Abrupt changes
- Wrong training examples
- Actionable patterns (principled use of domain
knowledge?) - Over-fitting vs. not missing the rare nuggets
- Richer patterns
- Simultaneous mining over multiple data types
- When to use which algorithm?
- Automatic, data-dependent selection of algorithm
parameters
12Discussion
- Should data mining be viewed as rich querying
and deeply integrated with database systems? - Most of current work make little use of database
functionality - Should analytics be an integral concern of
database systems? - Issues in data mining over heterogeneous data
repositories (Relationship to the heterogeneous
systems discussion)
13Summary
- Data mining has shown promise but needs much more
further research
We stand on the brink of great new answers, but
even more, of great new questions -- Matt Ridley