## Complete linkage

First, we will cluster using “complete” linkage which uses the maximum
dissimilarity

iris = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",
sep = ",", header = FALSE)
names(iris) = c("sepal.length", "sepal.width", "petal.length", "petal.width",
"iris.type")
iris_hclust = hclust(dist(iris[, -5]))
plot(iris_hclust)

We can cut the tree and look at the resulting clustering. Let’s cut it
at the canonical 3 groups. We see the results are quite similar to the
-means and mixture model results.

iris_3 = cutree(iris_hclust, k = 3)
plot(iris$sepal.length, iris$sepal.width, pch = 23, bg = c("red", "blue", "green")[iris_3])

From the first plot, the three groups corresponds to a height of roughly
4, perhaps a little bit less. We can also cut the tree by height. This
means that the maximum dissimilarity between any clusters is 4.

iris_h = cutree(iris_hclust, h = 3.9)
plot(iris$sepal.length, iris$sepal.width, pch = 23, bg = c("red", "blue", "green")[iris_h])

iris_6 = cutree(iris_hclust, k = 6)
plot(iris$sepal.length, iris$sepal.width, pch = 23, bg = c("red", "blue", "green",
"yellow", "orange", "purple")[iris_6])

## Single linkage

Single linkage uses the minimum distance between the clusters

iris_hclust_single = hclust(dist(iris[, -5]), method = "single")
plot(iris_hclust_single)

This plot has the prototypical ``chaining’’ seen in single linkage.
Its split into 3 groups has one large group, with a very small group of
size 2.

iris_3 = cutree(iris_hclust_single, k = 3)
plot(iris$sepal.length, iris$sepal.width, pch = 23, bg = c("red", "blue", "green")[iris_3])

## Group average linkage

Using `method="average"` yields the average linkage tree. It is
usually somewhat intermediate between complete and single linkage.

iris_hclust_average = hclust(dist(iris[, -5]), method = "average")
plot(iris_hclust_average)

iris_3 = cutree(iris_hclust_average, k = 3)
plot(iris$sepal.length, iris$sepal.width, pch = 23, bg = c("red", "blue", "green")[iris_3])