Optimal number of Clusters for the gaussian mixture models
Source:R/clustering_functions.R
Optimal_Clusters_GMM.Rd
Optimal number of Clusters for the gaussian mixture models
Usage
Optimal_Clusters_GMM(
data,
max_clusters,
criterion = "AIC",
dist_mode = "eucl_dist",
seed_mode = "random_subset",
km_iter = 10,
em_iter = 5,
verbose = FALSE,
var_floor = 1e-10,
plot_data = TRUE,
seed = 1
)
Arguments
- data
matrix or data frame
- max_clusters
either a numeric value, a contiguous or non-continguous numeric vector specifying the cluster search space
- criterion
one of 'AIC' or 'BIC'
- dist_mode
the distance used during the seeding of initial means and k-means clustering. One of, eucl_dist, maha_dist.
- seed_mode
how the initial means are seeded prior to running k-means and/or EM algorithms. One of, static_subset, random_subset, static_spread, random_spread.
- km_iter
the number of iterations of the k-means algorithm
- em_iter
the number of iterations of the EM algorithm
- verbose
either TRUE or FALSE; enable or disable printing of progress during the k-means and EM algorithms
- var_floor
the variance floor (smallest allowed value) for the diagonal covariances
- plot_data
either TRUE or FALSE indicating whether the results of the function should be plotted
- seed
integer value for random number generator (RNG)
Value
a vector with either the AIC or BIC for each iteration. In case of Error it returns the error message and the possible causes.
Details
AIC : the Akaike information criterion
BIC : the Bayesian information criterion
In case that the max_clusters parameter is a contiguous or non-contiguous vector then plotting is disabled. Therefore, plotting is enabled only if the max_clusters parameter is of length 1.
Examples
data(dietary_survey_IBS)
dat = dietary_survey_IBS[, -ncol(dietary_survey_IBS)]
dat = center_scale(dat)
opt_gmm = Optimal_Clusters_GMM(dat, 10, criterion = "AIC", plot_data = FALSE)
#----------------------------
# non-contiguous search space
#----------------------------
search_space = c(2,5)
opt_gmm = Optimal_Clusters_GMM(dat, search_space, criterion = "AIC", plot_data = FALSE)