trainNN: Train and assess accuracy of a k-NN model
In mqueinnec/foster: Forest Structure Extrapolation with R

Description Usage Arguments Details Value See Also Examples

This function trains a k-NN model from response variables (Y) and predictors (X) at reference observations using the package yaImpute (see yai). By default, the distance between observations is obtained from the proximity matrix of random forest regression or classification trees. Optionally, training and testing sets can be provided to return the accuracy of the trained k-NN model.

trainNN(
  x,
  y,
  inTrain = NULL,
  inTest = NULL,
  k = 1,
  method = "randomForest",
  impute.cont = NULL,
  impute.fac = NULL,
  ntree = 500,
  mtry = NULL,
  rfMode = "",
  ...
)

`x`	A dataframe or SpatialPointsDataFrame of predictors variables X for reference observations. Row names of X are used as identification of reference observations.
`y`	A dataframe or SpatialPointsDataFrame of response variables Y for the reference observations. Row names of Y are used as identification of reference observations.
`inTrain`	Optional. A list obtained from `partition`indicating which rows of x and y go to training.
`inTest`	Optional list indicating which rows of x and y go to validation. If left NULL, all rows that are not in `inTrain` are used for validation.
`k`	Integer. Number of nearest neighbors
`method`	Character. Which nearness metrics is used to compute the nearest neighbors. Default is `"randomForest"`. Other methods are listed in `yai`
`impute.cont`	Character. The method used to compute the imputed continuous variables. Can be `"closest"`, `"mean"`, `"median"` or `"dstWeighted"`. Default is `"closest"` if `k = 1` and `"dstWeighted"` if `k > 1`. See `impute.yai` for more details.
`impute.fac`	Character. The method used to compute the imputed values for factors. Default value is the same as `impute.cont`. See `impute.yai` for more details.
`ntree`	Number of classification or regression trees drawn for each response variable. Default is 500
`mtry`	Number of X variables picked randomly to split each node. Default is sqrt(number of X variables)
`rfMode`	By default, `rfMode` is set to `""` which forces `yai` to create random forest regression trees instead of classification trees for continuous variables. Can be set to `"buildClasses"` if wanting continuous variables to be converted to classes and forcing random forest to build classification trees. (See `yai`)
`...`	Other arguments passed to `yai` (e.g. `"rfXsubsets"`)

If performing model validation, the function trains a kNN model from the training set, finds the k NN of the validation set and imputes the response variables from the k NN. If k = 1, only the closest NN value is imputed. If k > 1, the imputed value can be either the closest NN value, the mean, median or distance weighted mean of the k NN values.This is controlled by the arguments impute.cont or impute.fac.

If inTest = NULL, all rows that are not in inTrain will be used for model testing. If inTrain = NULL, all rows that are not in inTest will be used for model training. If both inTrain and inTest are NULL, all rows of x and y will be used for training and no testing is performed.

The final model returned by findNN is trained from all observations of x and y.

A list containing the following objects:

model: A yai object, the trained k-NN model
preds: A data.frame with observed and predicted values of the testing set for each response variables

yai, newtargets, impute.yai, accuracy

# Load data in memory
# X_vars_sample: Predictor variables at sample (from getSample)
# Y_vars_sample: Response variables at sample (from getSample)
# train_idx: Rows of X_vars_sample and Y_vars_sample that are used for
# training (from (partition))
load(system.file("extdata/examples/example_trainNN.RData",package="foster"))

set.seed(1234) #for example reproducibility
kNN <- trainNN(x = X_vars_sample,
               y=Y_vars_sample,
               inTrain = train_idx,
               k = 1,
               method = "randomForest",
               ntree = 200)