build_training_data: Build the training datasets

build_training_dataR Documentation

Build the training datasets

Description

These functions create a subset of the dataset data_rangers and log-transform (+1) some of its columns.

Usage

build_initial_training_data(data, formula, survey, spatial = FALSE)

build_final_training_data(data, formula, survey, spatial = FALSE)

build_final_pred_data(data, formula, survey, spatial = FALSE, outliers = NULL)

build_data(data, formula, type, spatial = FALSE)

handle_PA_area(data, survey, formula = NULL, keep_details = FALSE)

handle_outliers(data, outliers = NULL)

handle_transform(data)

handle_order(data)

handle_na(data, response, NA_in_resp = NULL, NA_in_preds = NULL)

Arguments

data

the complete dataset

formula

the formula for the LMM or RF

survey

the criterion used to select rows depending on whether the focal number of personnel is: - completely unknown ("complete_unknown") - completely or partially unknown ("partial_unknown") - completely or partially known ("partial_known") - completely known ("complete") according to the choice, the variable PA_area is also adjusted.

spatial

whether or not keeping predictor for fitting spatial effects (default = FALSE)

outliers

a vector with the name of the countries/territories to discard

type

either "prediction" or "training"

keep_details

whether or not to keep variables used for construction (default = FALSE)

response

the unquoted name of the response variable

NA_in_resp

whether or not to keep only NA (TRUE) or discard them all (FALSE) in response variable (default = NULL -> do nothing)

NA_in_preds

whether or not to keep only NA (TRUE) or discard them all (FALSE) in predictor variables (default = NULL -> do nothing)

Value

a tibble

Functions

  • build_initial_training_data(): build the initial training datasets

  • build_final_training_data(): build the final training datasets

  • build_final_pred_data(): build the final prediction datasets

  • build_data(): internal function to build the training and prediction datasets

  • handle_PA_area(): internal function to handle PA_area while building the datasets

  • handle_outliers(): internal function to handle outliers while building the datasets

  • handle_transform(): internal function to handle variable transformation while building the datasets

  • handle_order(): internal function to handle order of variables while building the datasets

  • handle_na(): internal function to handle missing data while building the datasets

Examples

## Not run: 
## Here is how we created the data stored in this package:
data_test <- build_initial_training_data(data_rangers,
                                         formula = staff_rangers ~ pop_density_log +
                                                   lat + long + country_UN_subcontinent +
                                                   PA_area_log + area_country_log +
                                                   area_forest_pct + GDP_2019_log +
                                                   GDP_capita_log + GDP_growth +
                                                   unemployment_log + EVI + SPI + EPI_2020 +
                                                   IUCN_1_4_prop + IUCN_1_2_prop,
                                         survey = "partial_known")
data_test <- data_test[!is.na(data_test$staff_rangers_log), ]
if (require(usethis)) {
  usethis::use_data(data_test, overwrite = TRUE)
}

## End(Not run)


courtiol/rangeRinPA documentation built on Sept. 29, 2022, 9:54 a.m.