kernel k-nearest-neighbors using a distance matrix

Usage

distMat.KernelKnn(
  DIST_mat,
  TEST_indices = NULL,
  y,
  k = 5,
  h = 1,
  weights_function = NULL,
  regression = F,
  threads = 1,
  extrema = F,
  Levels = NULL,
  minimize = T
)

Arguments

DIST_mat: a distance matrix (square matrix) having a diagonal filled with either zero's (0) or NA's (missing values)
TEST_indices: a numeric vector specifying the indices of the test data in the distance matrix (row-wise or column-wise). If the parameter equals NULL then no test data is included in the distance matrix
y: a numeric vector (in classification the labels must be numeric from 1:Inf). It is assumed that if the TEST_indices is not NULL then the length of y equals to the rows of the train data ( nrow(DIST_mat) - length(TEST_indices) ), otherwise length(y) == nrow(DIST_mat).
k: an integer specifying the k-nearest-neighbors
h: the bandwidth (applicable if the weights_function is not NULL, defaults to 1.0)
weights_function: there are various ways of specifying the kernel function. See the details section.
regression: a boolean (TRUE,FALSE) specifying if regression or classification should be performed
threads: the number of cores to be used in parallel (openmp will be employed)
extrema: if TRUE then the minimum and maximum values from the k-nearest-neighbors will be removed (can be thought as outlier removal)
Levels: a numeric vector. In case of classification the unique levels of the response variable are necessary
minimize: either TRUE or FALSE. If TRUE then lower values will be considered as relevant for the k-nearest search, otherwise higher values.

Value

a vector (if regression is TRUE), or a data frame with class probabilities (if regression is FALSE)

Details

This function takes a distance matrix (square matrix where the diagonal is filled with 0 or NA) as input. If the TEST_indices parameter is NULL then the predictions for the train data will be returned, whereas if the TEST_indices parameter is not NULL then the predictions for the test data will be returned. There are three possible ways to specify the weights function, 1st option : if the weights_function is NULL then a simple k-nearest-neighbor is performed. 2nd option : the weights_function is one of 'uniform', 'triangular', 'epanechnikov', 'biweight', 'triweight', 'tricube', 'gaussian', 'cosine', 'logistic', 'gaussianSimple', 'silverman', 'inverse', 'exponential'. The 2nd option can be extended by combining kernels from the existing ones (adding or multiplying). For instance, I can multiply the tricube with the gaussian kernel by giving 'tricube_gaussian_MULT' or I can add the previously mentioned kernels by giving 'tricube_gaussian_ADD'. 3rd option : a user defined kernel function

Author

Lampros Mouselimis

Examples


data(Boston)

X = Boston[, -ncol(Boston)]
y = Boston[, ncol(Boston)]

dist_obj = dist(X)

dist_mat = as.matrix(dist_obj)

out = distMat.KernelKnn(dist_mat, TEST_indices = NULL, y, k = 5, regression = TRUE)