outliers.detect.mass | R Documentation |
This function runs the outlier detection methods described in gecko::outliers.detect() but for multi-species datasets, automatically adjusting for the amount of data available and strategy chosen. Species must have at least 3 data points in order to be processed. Additionally, inclusion of a training dataset will induce the function to use method "svm" which has an added restriction of needing at least 5 training points. For now species with insufficient data are accepted by default but future updates will allow users to choose a "lack of data" strategy.
outliers.detect.mass(
test,
train = NULL,
path = NULL,
strategy = "majority",
hi_res = FALSE,
crop = FALSE,
threshold = 0.05
)
test |
data.frame. With three columns containing species, latitude and longitude, describing the locations of a species, which may contain outliers. |
train |
data.frame. With the same formatting as |
path |
character. Path to a folder where plots scrutinizing decision making per species should be saved. |
strategy |
character. Strategy to use for combining the decisions of the
outlier detection methods used. Either |
hi_res |
logical. Specifies if 1 KM resolution environmental data should be used.
If |
crop |
logical. Indicates whether environmental data should be cropped to
an extent similar to what is given in |
threshold |
numeric. Value indicating the threshold for classifying
outliers in methods |
Environmental data used is WorldClim and requires a long download, see
gecko::gecko.setDir()
This function is a version of gecko::outliers.detect()
tailored for ease of handling datasets with multiple species. For details on
the methodology used to detect outliers please consult the documentation for that function.
list. With the first element being a dataset containing all elements
of the original test set except for those rejected
. The second element
is a table scrutinizing how many data points belonged to species not_in_common
,
those where a decision was not passed due to insufficient_data
,
and the ones that were accepted
and rejected
, with the latter being accompanied
by how much each group of methods was used as basis, e.g: env;geo
.
## Not run:
old_occurrences = gecko.data("records")
colnames(old_occurrences) = c("species", "long", "lat")
new_occurrences = data.frame(
species = rep(c("Hogna maderiana", "Malthonica oceanica", "Agroeca inopina"), each = 50),
long = c(runif(50, -17.1, -16.09), runif(50, -8.8, -7), runif(50, -6, -2)),
lat = c(runif(50, 32.73, 32.76), runif(50, 39.5, 40), runif(50, 40, 42))
)
outliers.detect.mass(new_occurrences, train = old_occurrences, path = path)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.