#### Previous topic

K-means clustering

#### Next topic

Model-based clustering

# Hierarchical clustering¶

First, we will cluster using “complete” linkage which uses the maximum dissimilarity

```    iris = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",
sep = ",", header = FALSE)
names(iris) = c("sepal.length", "sepal.width", "petal.length", "petal.width",
"iris.type")
iris_hclust = hclust(dist(iris[, -5]))
plot(iris_hclust)

``` We can cut the tree and look at the resulting clustering. Let’s cut it at the canonical 3 groups. We see the results are quite similar to the -means and mixture model results.

```    iris_3 = cutree(iris_hclust, k = 3)
plot(iris\$sepal.length, iris\$sepal.width, pch = 23, bg = c("red", "blue", "green")[iris_3])

``` From the first plot, the three groups corresponds to a height of roughly 4, perhaps a little bit less. We can also cut the tree by height. This means that the maximum dissimilarity between any clusters is 4.

```    iris_h = cutree(iris_hclust, h = 3.9)
plot(iris\$sepal.length, iris\$sepal.width, pch = 23, bg = c("red", "blue", "green")[iris_h])

``` ```    iris_6 = cutree(iris_hclust, k = 6)
plot(iris\$sepal.length, iris\$sepal.width, pch = 23, bg = c("red", "blue", "green",
"yellow", "orange", "purple")[iris_6])

``` Single linkage uses the minimum distance between the clusters

```    iris_hclust_single = hclust(dist(iris[, -5]), method = "single")
plot(iris_hclust_single)

``` This plot has the prototypical ``chaining’’ seen in single linkage. Its split into 3 groups has one large group, with a very small group of size 2.

```    iris_3 = cutree(iris_hclust_single, k = 3)
plot(iris\$sepal.length, iris\$sepal.width, pch = 23, bg = c("red", "blue", "green")[iris_3])

``` ```    iris_hclust_average = hclust(dist(iris[, -5]), method = "average") ```    iris_3 = cutree(iris_hclust_average, k = 3) 