sample_variables: Sample Observations from given a Dataset.

Description Usage Arguments Value See Also Examples

Description

Sample Observations from given a Dataset. This function offers three methods to sampling data; "binary classifier", "random" and "stratified". This Binary Classifier sampleing option acts as a wrapper for the ovun.sample() function from the ROSE package. Over sampling the data adds specific observations to balance the distribtuion of a specified variable. Under sampling the data removes specific observations to balance the distribution of a specific variable. Mix sampling the data uses both under sampling on the majority class and over on the minoruty class sampling to balance the distribution of a specific variable.

Usage

1
2
3
4
sample_variables(y_index = NULL, y_name = NULL, dataset,
  type = c("binary classifier", "stratified", "random"), method = c("both",
  "over", "under"), N, na.action = na.pass, file_name = NULL,
  directory = NULL)

Arguments

y_index

A column index representing the variable whoes distribution is to be sampled. The variable must be binary classifier.

y_name

A character value, indicating the column name of the response variable, the default is NULL.

dataset

A dataset from the samples are taken.

type

The type of sampling used; either "binary classifier", "stratified", "random"

method

The method of sampleing used; either "both", "over" or "under".

N

the desired sample size

na.action

Specify how NA values should be handled in the dataset. Four possible options; na.pass, na.omit, na,exclude and na.fail

file_name

A character object indicating the file name when saving the data frame. The default is NULL. The name must include the .csv suffixs.

directory

A character object specifying the directory where the data frame is to be saved as a .csv file.

Value

Outputs the descriptive statistics as a data frame.

See Also

derive_variables, extract_variables, impute_variables, standardise_variables, transform_variables

Examples

1
2
3
4
5
# mix sample a binary classifier
sample_variables(y_index = 2, dataset = titanic, type = "binary classifier",  method = "both", N = 1000, na.action = na.pass)

# random under sample
sample_variables(dataset = iris, type = "random", method = "under", N = 100)

oislen/BuenaVista documentation built on May 16, 2019, 8:12 p.m.