Title: KDD2001
1KDD-2001
- Some Introductions to the Seventh ACM SIGKDD
Conference
SpeakerHsin-Chen Chiao 2001/10/4
2Advance Program
- 20 Papers
- 5 Industry Track Invited Talks
- 6 Tutorials
- 3 Panels
- 32 Posters
K1. Mass Collaboration and Data Mining
K2. Extracting Targeted Data from the Web
K3. Challenges for Knowledge Discovery in Biology
P1Web Mining P2Applications P3Probabilistic
Modeling P4Visualization Interpretability P5Cl
assification Regression P6High Dimensional Data
3Best PaperRobust Space Transformations for
Distance-based Operations
- What is an appropriate space?
- Q(120,37,35) lt-gt A(120,40,35), B(130,37,35)
- Distances cant be calculated by only Euclidean
distance. - We need
- 1.Euclidean Property
- 2.Stability Property
4Best PaperRobust Space Transformations for
Distance-based Operations
- Donoho - Stahel Estimator
- Fixed - angle algorithm
- k-D Subsampling
- 1.pure
- 2.random
- 3.hybrid
5Web MiningPersonalization from Incomplete Data
What You Dont Know Can Hurt
- Clickstream data is used to predict behaviors.
- (site-centric user-centric)
- ex user1A1,A2,B1,B2,C1,C2,B3,B4,C3,A3
- user2C1,C2,C3,C4(buy in C4)
- Contradiction
- user1C1,C2,C3-gtnot buy
- user2C1,C2,C3-gtbuy
6Web MiningPersonalization from Incomplete Data
What You Dont Know Can Hurt
users
A1,A2
B1,B2
A1,B1
Random select
C1,C2
7High Dimensional DataUsing Ensembles of
Representations for Indexing Large Databases
- Dimensionality reduction
- Singular Value Decomposition(SVD)
- Discrete Fourier Transform(DFT)
- Discrete Wavelet Transform(DWT)
- False alarms false dismissals
- Dindex space(A,B) lt Dtrue(A,B)
8High Dimensional DataUsing Ensembles of
Representations for Indexing Large Databases
G
F
D
E
A
B
C
D
A
G
F
B
C
E
9High Dimensional DataUsing Ensembles of
Representations for Indexing Large Databases
- The E-index-2 use indices DWT(A,B,C,D) and
DFT(E,F,G)
10Conclusion
- Web miningpersonal behavior
- --from game or logfile
- Nearest distancedimensional reduction
efficiency - Applicationmolecular mining in HIV data