Frequencies of an existing cluster object

cluster_frequency(tokenized_list_text, cluster_vector, verbose = FALSE)

Arguments

tokenized_list_text

a list of tokenized text documents. This can be the result of the textTinyR::tokenize_transform_vec_docs function with the as_token parameter set to TRUE (the token object of the output)

cluster_vector

a numeric vector. This can be the result of the ClusterR::KMeans_rcpp function (the clusters object of the output)

verbose

either TRUE or FALSE. If TRUE then information will be printed out in the R session.

Value

a list of data.tables

Details

This function takes a list of tokenized text and a numeric vector of clusters and returns the sorted frequency of each cluster. The length of the tokenized_list_text object must be equal to the length of the cluster_vector object

Examples


library(textTinyR)

tok_lst = list(c('the', 'the', 'tokens', 'of', 'first', 'document'),
               c('the', 'tokens', 'of', 'of', 'second', 'document'),
               c('the', 'tokens', 'of', 'third', 'third', 'document'))

vec_clust = rep(1:6, 3)

res = cluster_frequency(tok_lst, vec_clust)