subset_dataset: Subset_dataset

Description Usage Arguments Details Value Examples

View source: R/subset_dataset.R

Description

After creating a flagged dataset, and creating the model variables, use this function to create training datasets.

Usage

1
subset_dataset(data, zeros_to_one = 3, seed = 42069)

Arguments

data

from create_model_variables

zeros_to_one

number of 0's to 1's, default is 3 0's to 1 1

seed

number to set seed so randomization is consistent

Details

Since there are significantly more non-outliers to outliers (zeros to ones), it is nessicary to under-sample the zeros while training models.

The default is zeros_to_one=3, meaning there are 3 '0' observations for every '1' observation. Every '1' observation is kept, and the '0's are randomly sampled. The results are also shuffled.

Value

dataframe

Examples

1
2
3
4
5
6
7
8
9
# UT <- get_weather_data("UT", "D:/Data/ghcnd_all/")
# data <- create_flagged_dataset(UT)
# data_1 <- create_model_variables(data)

# subset <- subset_dataset(data_1)

# trainSize <- .5
# train <- subset[1:(trainSize*nrow(subset)), ]
# test <- subset[(trainSize*nrow(subset)):nrow(subset), ]

scoutiii/HTSoutliers documentation built on April 4, 2021, 4:47 p.m.