Exclude highly correlated predictors

select_predictors(
  response_vector,
  predictors_matrix,
  response_lower_thresh = 0.1,
  predictors_upper_thresh = 0.75,
  threads = 1,
  verbose = FALSE
)

Arguments

response_vector

a numeric vector (the length should be equal to the rows of the predictors_matrix parameter)

predictors_matrix

a numeric matrix (the rows should be equal to the length of the response_vector parameter)

response_lower_thresh

a numeric value. This parameter allows the user to keep all the predictors having a correlation with the response greater than the response_lower_thresh value.

predictors_upper_thresh

a numeric value. This parameter allows the user to keep all the predictors having a correlation comparing to the other predictors less than the predictors_upper_thresh value.

threads

a numeric value specifying the number of cores to run in parallel

verbose

either TRUE or FALSE. If TRUE then information will be printed out in the R session.

Value

a vector of column-indices

Details

The function works in the following way : The correlation of the predictors with the response is first calculated and the resulted correlations are sorted in decreasing order. Then iteratively predictors with correlation higher than the predictors_upper_thresh value are removed by favoring those predictors which are more correlated with the response variable. If the response_lower_thresh value is greater than 0.0 then only predictors having a correlation higher than or equal to the response_lower_thresh value will be kept, otherwise they will be excluded. This function returns the indices of the predictors and is useful in case of multicollinearity.

If during computation the correlation between the response variable and a potential predictor is equal to NA or +/- Inf, then a correlation of 0.0 will be assigned to this particular pair.

Examples


library(textTinyR)

set.seed(1)
resp = runif(100)

set.seed(2)
col = runif(100)

matr = matrix(c(col, col^4, col^6, col^8, col^10), nrow = 100, ncol = 5)

out = select_predictors(resp, matr, predictors_upper_thresh = 0.75)