R/clustering_functions.R
Optimal_Clusters_GMM.Rd
Optimal number of Clusters for the gaussian mixture models
Optimal_Clusters_GMM(
data,
max_clusters,
criterion = "AIC",
dist_mode = "eucl_dist",
seed_mode = "random_subset",
km_iter = 10,
em_iter = 5,
verbose = FALSE,
var_floor = 1e-10,
plot_data = TRUE,
seed = 1
)
matrix or data frame
either a numeric value, a contiguous or non-continguous numeric vector specifying the cluster search space
one of 'AIC' or 'BIC'
the distance used during the seeding of initial means and k-means clustering. One of, eucl_dist, maha_dist.
how the initial means are seeded prior to running k-means and/or EM algorithms. One of, static_subset, random_subset, static_spread, random_spread.
the number of iterations of the k-means algorithm
the number of iterations of the EM algorithm
either TRUE or FALSE; enable or disable printing of progress during the k-means and EM algorithms
the variance floor (smallest allowed value) for the diagonal covariances
either TRUE or FALSE indicating whether the results of the function should be plotted
integer value for random number generator (RNG)
a vector with either the AIC or BIC for each iteration. In case of Error it returns the error message and the possible causes.
AIC : the Akaike information criterion
BIC : the Bayesian information criterion
In case that the max_clusters parameter is a contiguous or non-contiguous vector then plotting is disabled. Therefore, plotting is enabled only if the max_clusters parameter is of length 1.
data(dietary_survey_IBS)
dat = dietary_survey_IBS[, -ncol(dietary_survey_IBS)]
dat = center_scale(dat)
opt_gmm = Optimal_Clusters_GMM(dat, 10, criterion = "AIC", plot_data = FALSE)
#----------------------------
# non-contiguous search space
#----------------------------
search_space = c(2,5)
opt_gmm = Optimal_Clusters_GMM(dat, search_space, criterion = "AIC", plot_data = FALSE)