sample_variables: Sample Observations from given a Dataset.
In oislen/BuenaVista: Functions for Everyday Data Science Tasks

Description Usage Arguments Value See Also Examples

Sample Observations from given a Dataset. This function offers three methods to sampling data; "binary classifier", "random" and "stratified". This Binary Classifier sampleing option acts as a wrapper for the ovun.sample() function from the ROSE package. Over sampling the data adds specific observations to balance the distribtuion of a specified variable. Under sampling the data removes specific observations to balance the distribution of a specific variable. Mix sampling the data uses both under sampling on the majority class and over on the minoruty class sampling to balance the distribution of a specific variable.

sample_variables(y_index = NULL, y_name = NULL, dataset,
  type = c("binary classifier", "stratified", "random"), method = c("both",
  "over", "under"), N, na.action = na.pass, file_name = NULL,
  directory = NULL)

`y_index`	A column index representing the variable whoes distribution is to be sampled. The variable must be binary classifier.
`y_name`	A character value, indicating the column name of the response variable, the default is NULL.
`dataset`	A dataset from the samples are taken.
`type`	The type of sampling used; either "binary classifier", "stratified", "random"
`method`	The method of sampleing used; either "both", "over" or "under".
`N`	the desired sample size
`na.action`	Specify how NA values should be handled in the dataset. Four possible options; na.pass, na.omit, na,exclude and na.fail
`file_name`	A character object indicating the file name when saving the data frame. The default is NULL. The name must include the .csv suffixs.
`directory`	A character object specifying the directory where the data frame is to be saved as a .csv file.

Outputs the descriptive statistics as a data frame.

derive_variables, extract_variables, impute_variables, standardise_variables, transform_variables

# mix sample a binary classifier
sample_variables(y_index = 2, dataset = titanic, type = "binary classifier",  method = "both", N = 1000, na.action = na.pass)

# random under sample
sample_variables(dataset = iris, type = "random", method = "under", N = 100)