rf_reg_1: Processing of a split object to get data ready to be used and...

View source: R/rf_reg_1.R

rf_reg_1R Documentation

Processing of a split object to get data ready to be used and fitted with a rf_reg_1 (random forest) regression model.

Description

The function processes a split object (training + test sets), according to the configuration set by the user. For instance, genomic information is incorporated according to the option set by the user. A list of specific environmental covariables to use can be provided.

A recipe is created using the package recipes, to specify additional preprocessing steps, such as standardization based on the training set, with same transformations used on the test set. Variables with null variance are removed. If year effect is included, it is converted to dummy variables.
Further fitting on the training set with a gradient boosting model (see function fit_cv_split.rf_reg_1())).

Usage

new_rf_reg_1(
  split = NULL,
  trait = NULL,
  geno = NULL,
  env_predictors = NULL,
  info_environments = NULL,
  use_selected_markers = F,
  SNPs = NULL,
  include_env_predictors = T,
  list_env_predictors = NULL,
  type_location_info = "location_factor",
  year_included = F,
  ...
)

rf_reg_1(
  split,
  trait,
  geno,
  env_predictors,
  info_environments,
  use_selected_markers,
  SNPs,
  list_env_predictors,
  include_env_predictors,
  type_location_info,
  year_included,
  ...
)

validate_rf_reg_1(x, ...)

Arguments

split

an object of class split. A split object contains a training and test elements.

trait

character Name of the trait to predict. An ordinal trait should be encoded as integer.

geno

data.frame It corresponds to a geno element within an object of class METData.

env_predictors

data.frame It corresponds to the env_data element within an object of class METData.

info_environments

data.frame It corresponds to the info_environments element within an object of class METData.

use_selected_markers

A Logical indicating whether to use a subset of markers  identified via single-environment GWAS or based on the table of marker effects obtained via Elastic Net as predictor variables, when main genetic effects are modeled with principal components.
If use_selected_markers is TRUE, the SNPs argument should be provided. For more details, see select_markers()

SNPs

A data.frame with the genotype matrix (individuals in rows and selected markers in columns) for SNPs selected via the select_markers() function. Optional argument, can remain as NULL if no single markers should be incorporated as predictor variables in analyses based on PCA decomposition.

include_env_predictors

A logical indicating whether environmental covariates characterizing each environment should be used in predictions.

list_env_predictors

A character vector containing the names of the environmental predictors which should be used in predictions. By default NULL: all environmental predictors included in the env_data table of the METData object will be used.

type_location_info

logical indicates how the location information is encoded, among location_factor, lon_lat_numeric, no_location_information. Default is location_factor, meaning that the variable location is used as a categorical variable in the model (encoded as dummy variable).

year_included

logical indicates if year factor should be used as predictor variable. Default is FALSE.

Value

A list object of class rf_reg_1 with the following items:

training

data.frame Training set after partial processing

test

data.frame Test set after partial processing

rec

A recipe object, specifying the remaining processing steps which are implemented when a model is fitted on the training set with a recipe.

References

\insertRef

wickham2019welcomelearnMET \insertReftidymodelslearnMET


cjubin/learnMET documentation built on Nov. 4, 2024, 6:23 p.m.