Unsupervised Learning(Part-5)

Unsupervised Learning(Part-5)

August 18, 2020

Hierarchical Clustering

Create a nested series of partitions represented in the form of a dendogram

Shows how objects are grouped together step by step

Typical stopping criteria

Number of clusters

Minimum distance between clusters being greater than a user defined threshold

Types of Hierarchical Clustering

Agglomerative

Starts by assuming that each object is a separate cluster

Successively merges clusters until some stopping criterion is met. For example, until the number of clusters is 1

Divisive

Starts by assuming the existence of only one cluster

Successively splits a cluster into two or more sub-clusters

Single-Link

Distance between two clusters is defined as the minimum distance between pairs of objects, one from each cluster

Can result in a chaining effect

C2 is closer to C1

Builds minimum spanning tree

Edge only exist between “closest” nodes

Complete-Link

Distance between to clusters is the maximum distance between the pairs of objects, one from each cluster
Merging clusters has the minimum increase in diameter of the cluster C2 is closer to C3

Prototype (Centroid/Mediod) Distance

Calculate a centroid/mediod for each cluster (prototype) Alternatives

Merge those clusters that have the minimum distance between their prototypes (Ward’s Method)

Merge two clusters that results in the minimum increase in SSE

Have the same disadvantages as partitional algorithms

Average-Link

Average distance across all pairs of objects (one drawn from each cluster)

Rationale: sub-clusters belonging to the same clustering will tend to have high connectivity

Comments