vocabulary counts

vocabulary_counts(train_data = NULL, MAX_vocab = 0, MIN_count = 1,
  output_vocabulary = NULL, trace = FALSE)

Arguments

train_data

a character string specifying the path to the train text file

MAX_vocab

a value specifying the number of terms in the vocabulary. For instance a MAX_vocab value of 0 includes all the vocab-terms.

MIN_count

a value greater or equal to 1. It specifies the minimum occurrences (counts of words) for inclusion in the vocabulary

output_vocabulary

a character string specifying the path to the output text file

trace

either TRUE or FALSE. If TRUE information will be printed out

Value

a character string specifying the location of the saved data

References

https://github.com/stanfordnlp/GloVe

http://nlp.stanford.edu/projects/glove/

http://nlp.stanford.edu/pubs/glove.pdf

Examples

# library(GloveR) # res = vocabulary_counts(train_data = '/data_GloveR/dat.txt', MAX_vocab = 0, # MIN_count = 5, output_vocabulary = '/data_GloveR/VOCAB.txt', trace = T)