This function returns the k nearest indices and distances of each observation

knn.index.dist(
  data,
  TEST_data = NULL,
  k = 5,
  method = "euclidean",
  transf_categ_cols = F,
  threads = 1
)

Arguments

data

a data.frame or matrix

TEST_data

a data.frame or matrix (it can be also NULL)

k

an integer specifying the k-nearest-neighbors

method

a string specifying the method. Valid methods are 'euclidean', 'manhattan', 'chebyshev', 'canberra', 'braycurtis', 'pearson_correlation', 'simple_matching_coefficient', 'minkowski' (by default the order 'p' of the minkowski parameter equals k), 'hamming', 'mahalanobis', 'jaccard_coefficient', 'Rao_coefficient'

transf_categ_cols

a boolean (TRUE, FALSE) specifying if the categorical columns should be converted to numeric or to dummy variables

threads

the number of cores to be used in parallel (openmp will be employed)

Value

a list of length 2. The first sublist returns the indices and the second the distances of the k nearest neighbors for each observation. If TEST_data is NULL the number of rows of each sublist equals the number of rows in the train data. If TEST_data is not NULL the number of rows of each sublist equals the number of rows in the TEST data.

Details

This function takes a number of arguments and it returns the indices and distances of the k-nearest-neighbors for each observation. If TEST_data is NULL then the indices-distances for the train data will be returned, whereas if TEST_data is not NULL then the indices-distances for the TEST_data will be returned.

Author

Lampros Mouselimis

Examples


data(Boston)

X = Boston[, -ncol(Boston)]

out = knn.index.dist(X, TEST_data = NULL, k = 4, method = 'euclidean', threads = 1)