derive_variables: Derive Variables from a given Dataset

Description Usage Arguments Value See Also Examples

Description

This function derives power, interaction or dummy variables for a given dataset. The power terms are derived from raising each numeric variable in the specified dataset by a power. The interaction terms are derived from multipling all the numeric variables among one another. The dummy terms are derived from by generating binary terms for each level of the factor variables. The resulting data frame can be saved to a specified dataset.

Usage

1
2
3
derive_variables(dataset, y_index = NULL, type = c("interaction", "power",
  "dummy"), power = NULL, integer = TRUE, return_dataset = FALSE,
  file_name = NULL, directory = NULL)

Arguments

dataset

The dataset that the variables are derived from.

y_index

A natural number representing the response variable of the dataset that will be used in the derivation of new variables. Default is NULL.

type

The type of variables to be derived; either dummy, interction or power. Default is interaction.

power

A numeric value indicating the desired power, used in conjungtion with deriving power terms. Default is NULL.

integer

A logical object indicating whether the dummy variables should be stored as integers, used in conjungtion with deriving dummy terms. Alternatively the dummy variables are stored as factors. Default is TRUE.

return_dataset

A logical object indicating whether the newly derived power terms and the original terms should be returned. Alteratively, only the newly derived terms are returned. Default is FALSE.

file_name

A character object indicating the file name when saving the data frame. The default is NULL. The name must include the .csv suffixs.

directory

A character object specifying the directory where the data frame is to be saved as a .csv file. Default is NULL.

Value

Outputs the newly derived terms as a data frame

See Also

remove_variables, extract_variables, impute_variables, standardise_variables, transform_variables

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Example - Lung Capacity Data

# Save the current working directory
dir <- getwd()

# Intital Data Profiling
descriptive_statistics(dataset = lungcap, type = "numeric")

# Derive Interaction Variables 
derive_variables(dataset = lungcap, type = "interaction")
derive_variables(dataset = lungcap, type = "interaction", y_index = 1)
derive_variables(dataset = lungcap, type = "interaction", y_index = 1, return_dataset = TRUE)

# Derive Power Variables
derive_variables(dataset = lungcap, type = "power", p = 2)
derive_variables(dataset = lungcap, type = "power", p = 3, y_index = 1)
derive_variables(dataset = lungcap, type = "power", p = 2, y_index = 1, return_dataset = TRUE)

# Derive Dummy Variables
derive_variables(dataset = lungcap, type = "dummy")
derive_variables(dataset = lungcap, type = "dummy", integer = FALSE)
derive_variables(dataset = lungcap, type = "dummy", y_index = 5, return_dataset = TRUE))

oislen/BuenaVista documentation built on May 16, 2019, 8:12 p.m.