Frequencies of an existing cluster object
cluster_frequency(tokenized_list_text, cluster_vector, verbose = FALSE)
tokenized_list_text | a list of tokenized text documents. This can be the result of the textTinyR::tokenize_transform_vec_docs function with the as_token parameter set to TRUE (the token object of the output) |
---|---|
cluster_vector | a numeric vector. This can be the result of the ClusterR::KMeans_rcpp function (the clusters object of the output) |
verbose | either TRUE or FALSE. If TRUE then information will be printed out in the R session. |
a list of data.tables
This function takes a list of tokenized text and a numeric vector of clusters and returns the sorted frequency of each cluster. The length of the tokenized_list_text object must be equal to the length of the cluster_vector object
library(textTinyR) tok_lst = list(c('the', 'the', 'tokens', 'of', 'first', 'document'), c('the', 'tokens', 'of', 'of', 'second', 'document'), c('the', 'tokens', 'of', 'third', 'third', 'document')) vec_clust = rep(1:6, 3) res = cluster_frequency(tok_lst, vec_clust)