This function returns the k nearest indices and distances of each observation
knn.index.dist( data, TEST_data = NULL, k = 5, method = "euclidean", transf_categ_cols = F, threads = 1 )
data | a data.frame or matrix |
---|---|
TEST_data | a data.frame or matrix (it can be also NULL) |
k | an integer specifying the k-nearest-neighbors |
method | a string specifying the method. Valid methods are 'euclidean', 'manhattan', 'chebyshev', 'canberra', 'braycurtis', 'pearson_correlation', 'simple_matching_coefficient', 'minkowski' (by default the order 'p' of the minkowski parameter equals k), 'hamming', 'mahalanobis', 'jaccard_coefficient', 'Rao_coefficient' |
transf_categ_cols | a boolean (TRUE, FALSE) specifying if the categorical columns should be converted to numeric or to dummy variables |
threads | the number of cores to be used in parallel (openmp will be employed) |
a list of length 2. The first sublist returns the indices and the second the distances of the k nearest neighbors for each observation. If TEST_data is NULL the number of rows of each sublist equals the number of rows in the train data. If TEST_data is not NULL the number of rows of each sublist equals the number of rows in the TEST data.
This function takes a number of arguments and it returns the indices and distances of the k-nearest-neighbors for each observation. If TEST_data is NULL then the indices-distances for the train data will be returned, whereas if TEST_data is not NULL then the indices-distances for the TEST_data will be returned.
Lampros Mouselimis
data(Boston) X = Boston[, -ncol(Boston)] out = knn.index.dist(X, TEST_data = NULL, k = 4, method = 'euclidean', threads = 1)