trainNN: Train and assess accuracy of a k-NN model

Description Usage Arguments Details Value See Also Examples

View source: R/trainNN.R

Description

This function trains a k-NN model from response variables (Y) and predictors (X) at reference observations using the package yaImpute (see yai). By default, the distance between observations is obtained from the proximity matrix of random forest regression or classification trees. Optionally, training and testing sets can be provided to return the accuracy of the trained k-NN model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
trainNN(
  x,
  y,
  inTrain = NULL,
  inTest = NULL,
  k = 1,
  method = "randomForest",
  impute.cont = NULL,
  impute.fac = NULL,
  ntree = 500,
  mtry = NULL,
  rfMode = "",
  ...
)

Arguments

x

A dataframe or SpatialPointsDataFrame of predictors variables X for reference observations. Row names of X are used as identification of reference observations.

y

A dataframe or SpatialPointsDataFrame of response variables Y for the reference observations. Row names of Y are used as identification of reference observations.

inTrain

Optional. A list obtained from partitionindicating which rows of x and y go to training.

inTest

Optional list indicating which rows of x and y go to validation. If left NULL, all rows that are not in inTrain are used for validation.

k

Integer. Number of nearest neighbors

method

Character. Which nearness metrics is used to compute the nearest neighbors. Default is "randomForest". Other methods are listed in yai

impute.cont

Character. The method used to compute the imputed continuous variables. Can be "closest", "mean", "median" or "dstWeighted". Default is "closest" if k = 1 and "dstWeighted" if k > 1. See impute.yai for more details.

impute.fac

Character. The method used to compute the imputed values for factors. Default value is the same as impute.cont. See impute.yai for more details.

ntree

Number of classification or regression trees drawn for each response variable. Default is 500

mtry

Number of X variables picked randomly to split each node. Default is sqrt(number of X variables)

rfMode

By default, rfMode is set to "" which forces yai to create random forest regression trees instead of classification trees for continuous variables. Can be set to "buildClasses" if wanting continuous variables to be converted to classes and forcing random forest to build classification trees. (See yai)

...

Other arguments passed to yai (e.g. "rfXsubsets")

Details

If performing model validation, the function trains a kNN model from the training set, finds the k NN of the validation set and imputes the response variables from the k NN. If k = 1, only the closest NN value is imputed. If k > 1, the imputed value can be either the closest NN value, the mean, median or distance weighted mean of the k NN values.This is controlled by the arguments impute.cont or impute.fac.

If inTest = NULL, all rows that are not in inTrain will be used for model testing. If inTrain = NULL, all rows that are not in inTest will be used for model training. If both inTrain and inTest are NULL, all rows of x and y will be used for training and no testing is performed.

The final model returned by findNN is trained from all observations of x and y.

Value

A list containing the following objects:

model

A yai object, the trained k-NN model

preds

A data.frame with observed and predicted values of the testing set for each response variables

See Also

yai, newtargets, impute.yai, accuracy

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Load data in memory
# X_vars_sample: Predictor variables at sample (from getSample)
# Y_vars_sample: Response variables at sample (from getSample)
# train_idx: Rows of X_vars_sample and Y_vars_sample that are used for
# training (from (partition))
load(system.file("extdata/examples/example_trainNN.RData",package="foster"))

set.seed(1234) #for example reproducibility
kNN <- trainNN(x = X_vars_sample,
               y=Y_vars_sample,
               inTrain = train_idx,
               k = 1,
               method = "randomForest",
               ntree = 200)

mqueinnec/foster documentation built on March 28, 2021, 4:27 p.m.