This function utilizes kernel k nearest neighbors to predict new observations
KernelKnn( data, TEST_data = NULL, y, k = 5, h = 1, method = "euclidean", weights_function = NULL, regression = F, transf_categ_cols = F, threads = 1, extrema = F, Levels = NULL )
data | a data frame or matrix |
---|---|
TEST_data | a data frame or matrix (it can be also NULL) |
y | a numeric vector (in classification the labels must be numeric from 1:Inf) |
k | an integer specifying the k-nearest-neighbors |
h | the bandwidth (applicable if the weights_function is not NULL, defaults to 1.0) |
method | a string specifying the method. Valid methods are 'euclidean', 'manhattan', 'chebyshev', 'canberra', 'braycurtis', 'pearson_correlation', 'simple_matching_coefficient', 'minkowski' (by default the order 'p' of the minkowski parameter equals k), 'hamming', 'mahalanobis', 'jaccard_coefficient', 'Rao_coefficient' |
weights_function | there are various ways of specifying the kernel function. See the details section. |
regression | a boolean (TRUE,FALSE) specifying if regression or classification should be performed |
transf_categ_cols | a boolean (TRUE, FALSE) specifying if the categorical columns should be converted to numeric or to dummy variables |
threads | the number of cores to be used in parallel (openmp will be employed) |
extrema | if TRUE then the minimum and maximum values from the k-nearest-neighbors will be removed (can be thought as outlier removal) |
Levels | a numeric vector. In case of classification the unique levels of the response variable are necessary |
a vector (if regression is TRUE), or a data frame with class probabilities (if regression is FALSE)
This function takes a number of arguments and it returns the predicted values. If TEST_data is NULL then the predictions for the train data will be returned, whereas if TEST_data is not NULL then the predictions for the TEST_data will be returned. There are three possible ways to specify the weights function, 1st option : if the weights_function is NULL then a simple k-nearest-neighbor is performed. 2nd option : the weights_function is one of 'uniform', 'triangular', 'epanechnikov', 'biweight', 'triweight', 'tricube', 'gaussian', 'cosine', 'logistic', 'gaussianSimple', 'silverman', 'inverse', 'exponential'. The 2nd option can be extended by combining kernels from the existing ones (adding or multiplying). For instance, I can multiply the tricube with the gaussian kernel by giving 'tricube_gaussian_MULT' or I can add the previously mentioned kernels by giving 'tricube_gaussian_ADD'. 3rd option : a user defined kernel function
Lampros Mouselimis
data(Boston) X = Boston[, -ncol(Boston)] y = Boston[, ncol(Boston)] out = KernelKnn(X, TEST_data = NULL, y, k = 5, method = 'euclidean', regression = TRUE)