benchmark_sdm: Benchmark regular models
In boyanangelov/sdmbench: Benchmark Species Distribution Models

Description Usage Arguments Value Examples

A function to benchmark a collection of regular machine learning models.

1 2	benchmark_sdm(benchmarking_data, learners, dataset_type = "default", sample = FALSE)

`benchmarking_data`	A dataframe from the output of `get_benchmarking_data` function. This dataset contains species occurrence coordinates together with a set of environmental data points.
`learners`	A list of mlr learner objects which specify which models to use (i.e. Random Forests). The following learners are supported: "classif.logreg", "classif.gbm", "classif.multinom", "classif.naiveBayes", "classif.xgboost", "classif.ksvm".
`dataset_type`	A character string indicating spatial partitioning method. This is used in order to avoid spatial autocorrelation issues.
`sample`	Logical. Indicates whether benchmarking should be done on an undersampled dataset. This is useful for testing model efficiency with an imbalanced dataset (i.e. few observations and many background (pseudo-absence) points).

Benchmarking object (class bmr). This object can be accessed by other functions in order to obtain the benchmark results.

## Not run: 
# download benchmarking data
benchmarking_data <- get_benchmarking_data("Lynx lynx",
                                           limit = 1500)

# create a list of algorithms to compare
# here it is important to specify predict.type as "prob"
learners <- list(mlr::makeLearner("classif.randomForest",
                                  predict.type = "prob"),
                 mlr::makeLearner("classif.logreg",
                                 predict.type = "prob"))

# run the model benchmarking process
# if you have previously used a partitioning method you should specify it here
bmr <- benchmark_sdm(benchmarking_data$df_data,
                    learners = learners,
                    dataset_type = "default")

# for benchmarking an imbalanced dataset you can undersample
bmr <- benchmark_sdm(benchmarking_data$df_data,
                    learners = learners,
                    dataset_type = "default",
                    sample = TRUE)

# inspect the benchmark results
bmr

## End(Not run)