Optimal number of Clusters for the gaussian mixture models

Optimal_Clusters_GMM(
  data,
  max_clusters,
  criterion = "AIC",
  dist_mode = "eucl_dist",
  seed_mode = "random_subset",
  km_iter = 10,
  em_iter = 5,
  verbose = FALSE,
  var_floor = 1e-10,
  plot_data = TRUE,
  seed = 1
)

Arguments

data

matrix or data frame

max_clusters

either a numeric value, a contiguous or non-continguous numeric vector specifying the cluster search space

criterion

one of 'AIC' or 'BIC'

dist_mode

the distance used during the seeding of initial means and k-means clustering. One of, eucl_dist, maha_dist.

seed_mode

how the initial means are seeded prior to running k-means and/or EM algorithms. One of, static_subset, random_subset, static_spread, random_spread.

km_iter

the number of iterations of the k-means algorithm

em_iter

the number of iterations of the EM algorithm

verbose

either TRUE or FALSE; enable or disable printing of progress during the k-means and EM algorithms

var_floor

the variance floor (smallest allowed value) for the diagonal covariances

plot_data

either TRUE or FALSE indicating whether the results of the function should be plotted

seed

integer value for random number generator (RNG)

Value

a vector with either the AIC or BIC for each iteration. In case of Error it returns the error message and the possible causes.

Details

AIC : the Akaike information criterion

BIC : the Bayesian information criterion

In case that the max_clusters parameter is a contiguous or non-contiguous vector then plotting is disabled. Therefore, plotting is enabled only if the max_clusters parameter is of length 1.

Author

Lampros Mouselimis

Examples


data(dietary_survey_IBS)

dat = dietary_survey_IBS[, -ncol(dietary_survey_IBS)]

dat = center_scale(dat)

opt_gmm = Optimal_Clusters_GMM(dat, 10, criterion = "AIC", plot_data = FALSE)


#----------------------------
# non-contiguous search space
#----------------------------

search_space = c(2,5)

opt_gmm = Optimal_Clusters_GMM(dat, search_space, criterion = "AIC", plot_data = FALSE)