Exploring and Comparing Unsupervised Clustering Algorithms

Marc Lavielle; Philip D. Waggoner

doi:10.5334/jors.269

Exploring and Comparing Unsupervised Clustering Algorithms

Journal of Open Research Software

Volume 8 (2020): Issue 1

By: Marc Lavielle and Philip D. Waggoner

Open Access

|Oct 2020

Abstract

One of the most widely used approaches to explore and understand non-random structure in data in a largely assumption-free manner is clustering. In this paper, we detail two original Shiny apps written in R, openly developed at Github, and archived at Zenodo, for exploring and comparing major unsupervised algorithms for clustering applications: k-means and Gaussian mixture models via Expectation-Maximization. The first app leverages simulated data and the second uses Fisher’s Iris data set to visually and numerically compare the clustering algorithms using data familiar to many applied researchers. In addition to being valuable tools for comparing these clustering techniques, the open source architecture of our Shiny apps allows for wide engagement and extension by the broader open science community, such as including different data sets and algorithms.

References

Jain, A K, Murty, M N and Flynn, P J 1999 Data clustering: a review. ACM computing surveys (CSUR), 31(3): 264–323. DOI: 10.1145/331499.331504
Open DOI Search in Google Scholar Back to article
Hartigan, J A and Wong, M A 1979 Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1): 100–108. DOI: 10.2307/2346830
Open DOI Search in Google Scholar Back to article
Banfield, J D and Raftery, A E 1993 Model-based Gaussian and non-Gaussian clustering. Biometrics, 803–821. DOI: 10.2307/2532201
Open DOI Search in Google Scholar Back to article
Chang, W, Cheng, J, Allaire, J, Xie, Y and McPherson, J 2015 Shiny: web application framework for R. R package version 1.2.0.
Search in Google Scholar Back to article
Wickham, H 2016 ggplot2: Elegant graphics for data analysis. Springer. DOI: 10.1007/978-3-319-24277-4
Open DOI Search in Google Scholar Back to article
Cheng, J 2017 htmltools: Tools for HTML. R package version 0.3.6.
Search in Google Scholar Back to article
Murdoch, D and Chow, E D 2018 ellipse: Functions for drawing ellipses and ellipse-like confidence regions. R package version 0.4.1.
Search in Google Scholar Back to article
Perrier, V, Meyer, F and Granjon, D 2019 shinyWidgets: Custom inputs widgets for Shiny. R package version 0.4.7.
Search in Google Scholar Back to article
Young, D, Benaglia, T, Chauveau, D and Hunter, D 2017 mixtools: Tools for Analysing Finite Mixture Models. R package version 1.1.0.
Search in Google Scholar Back to article
McLachlan, G J, Lee, S X and Rathnayake, S I 2000 Finite mixture models. Annual Review of Statistics and Its Application, no. 0.
Search in Google Scholar Back to article
Waggoner, P D Forthcoming Unsupervised Machine Learning for Clustering in Political and Social Research. New York: Cambridge University Press.
Search in Google Scholar Back to article