Title: Unsupervised Learning and Clustering
1Unsupervised Learningand Clustering
- Shyh-Kang Jeng
- Department of Electrical Engineering/
- Graduate Institute of Communication/
- Graduate Institute of Networking and Multimedia,
National Taiwan University
2Supervised vs. Unsupervised Learning
- Supervised training procedures
- Use samples labeled by their category membership
- Unsupervised training procedures
- Use unlabeled samples
3Reasons for interest
- Collecting and labeling a large set of sample
patterns can be costly - e.g., speech
- Training with large amount of unlabeled data, and
using supervision to label the groupings found - For data mining applications
- Improved performance for data with slow changes
of characteristics of patterns by tracking in an
unsupervised mode - Automated food classification when seasons change
4Reasons for interest
- Can use unsupervised methods to find features
that will then be useful for categorization - Data dependent smart preprocessing or smart
feature extraction - Perform exploratory data analysis and gain
insights into the nature or structure of the data - Discovery of distinct clusters may suggest us to
alter the approach to designing the classifier
5Basic Assumptions to Begin with
- Samples come from a known number c of classes
- Prior probabilities P(wj) for each class are
known - Forms for the class-conditional probability
densities p(xwj,qj) are known - Values for parameter vectors q1, , qc are
unknown - Category labels are unknown
6Mixing Density
7Goal and Approach
- Use samples drawn from the mixture density to
estimate the unknown parameter vector q - With known q, we can decompose the mixture into
its components and use a maximum a posteriori
classifier on the derived densities
8Existence of Solutions
- Suppose unlimited number of samples and
nonparametric methods are available - If there is only one value of q that will produce
the observed values for p(xq) , a solution is
possible in principle - If several different values of q can produce the
same values for p(xq) , then there is no hope of
obtaining a unique solution
9Identifiable Density
10An Example of Unidentifiable Mixture of Discrete
Distributions
11An Example of Unidentifiable Mixture of Gaussian
Distributions
12Maximum-Likelihood Estimates
13Maximum-Likelihood Estimates
14Maximum-Likelihood Estimates
15Maximum-Likelihood Estimates for Unknown Priors
16Maximum-Likelihood Estimates for Unknown Priors
17Application to Normal Mixtures
- Component densities p(xwi,qi)N(mi,Si)
- Three cases
18Case 1 Unknown Mean Vectors
19Case 1 Unknown Mean Vectors
20Case 1 Unknown Mean Vectors
21Case 2 All Parameters Unknown
22Case 2 All Parameters Unknown
23k-Means Clustering
24k-Means Clustering
- initialize n, c, m1, m2, , mc
- do classify n samples according to nearest mi
- recompute mi
- until no change in mi
- return m1, m2, , mc
- end
25k-Means Clustering
- Complexity O(ndcT)
- In practice, the number of iterations T is
generally much less than the number of samples - The values obtained can be accepted as the
answer, or can be used as starting points for
more exact computations
26k-Means Clustering
27k-Means Clustering