Unsupervised Learning(Part-5)
Hierarchical Clustering
- Create a nested series of partitions represented in the form of a dendogram
- Shows how objects are grouped together step by step
- Typical stopping criteria
- Number of clusters
- Minimum distance between clusters being greater than a user defined threshold
Types of Hierarchical Clustering
Agglomerative
- Starts by assuming that each object is a separate cluster
- Successively merges clusters until some stopping criterion is met. For example, until the number of clusters is 1
Divisive
- Starts by assuming the existence of only one cluster
- Successively splits a cluster into two or more sub-clusters
Single-Link
- Distance between two clusters is defined as the minimum distance between pairs of objects, one from each cluster
- Can result in a chaining effect
- C2 is closer to C1
- Builds minimum spanning tree
- Edge only exist between “closest” nodes
Complete-Link
- Distance between to clusters is the maximum distance between the pairs of objects, one from each cluster
- Merging clusters has the minimum increase in diameter of the cluster C2 is closer to C3
Prototype (Centroid/Mediod) Distance
- Calculate a centroid/mediod for each cluster (prototype) Alternatives
- Merge those clusters that have the minimum distance between their prototypes (Ward’s Method)
- Merge two clusters that results in the minimum increase in SSE
- Have the same disadvantages as partitional algorithms
Average-Link
- Average distance across all pairs of objects (one drawn from each cluster)
- Rationale: sub-clusters belonging to the same clustering will tend to have high connectivity
Comments
Post a Comment