data_modify: Create new variables in a data frame

View source: R/data_modify.R

data_modifyR Documentation

Create new variables in a data frame

Description

Create new variables or modify existing variables in a data frame. Unlike base::transform(), data_modify() can be used on grouped data frames, and newly created variables can be directly used.

Usage

data_modify(data, ...)

Arguments

data

A data frame

...

One or more expressions that define the new variable name and the values or recoding of those new variables. These expressions can be one of:

  • A sequence of named, literal expressions, where the left-hand side refers to the name of the new variable, while the right-hand side represent the values of the new variable. Example: Sepal.Width = center(Sepal.Width).

  • A sequence of string values, representing expressions.

  • A variable that contains a string representation of the expression. Example:

    a <- "2 * Sepal.Width"
    data_modify(iris, a)
    
  • A character vector of expressions. Example: c("SW_double = 2 * Sepal.Width", "SW_fraction = SW_double / 10"). This type of expression cannot be mixed with other expressions, i.e. if a character vector is provided, you may not add further elements to ....

  • Using NULL as right-hand side removes a variable from the data frame. Example: Petal.Width = NULL.

Note that newly created variables can be used in subsequent expressions. See also 'Examples'.

Note

data_modify() can also be used inside functions. However, it is recommended to pass the recode-expression as character vector or list of characters.

Examples

data(efc)
new_efc <- data_modify(
  efc,
  c12hour_c = center(c12hour),
  c12hour_z = c12hour_c / sd(c12hour, na.rm = TRUE),
  c12hour_z2 = standardize(c12hour)
)
head(new_efc)

# using strings instead of literal expressions
new_efc <- data_modify(
  efc,
  "c12hour_c = center(c12hour)",
  "c12hour_z = c12hour_c / sd(c12hour, na.rm = TRUE)",
  "c12hour_z2 = standardize(c12hour)"
)
head(new_efc)

# using character strings, provided as variable
stand <- "c12hour_c / sd(c12hour, na.rm = TRUE)"
new_efc <- data_modify(
  efc,
  c12hour_c = center(c12hour),
  c12hour_z = stand
)
head(new_efc)

# providing expressions as character vector
new_exp <- c(
  "c12hour_c = center(c12hour)",
  "c12hour_z = c12hour_c / sd(c12hour, na.rm = TRUE)"
)
new_efc <- data_modify(efc, new_exp)
head(new_efc)

# attributes - in this case, value and variable labels - are preserved
str(new_efc)

# overwrite existing variable, remove old variable
out <- data_modify(iris, Petal.Length = 1 / Sepal.Length, Sepal.Length = NULL)
head(out)

# works on grouped data
grouped_efc <- data_group(efc, "c172code")
new_efc <- data_modify(
  grouped_efc,
  c12hour_c = center(c12hour),
  c12hour_z = c12hour_c / sd(c12hour, na.rm = TRUE),
  c12hour_z2 = standardize(c12hour)
)
head(new_efc)

# works from inside functions
foo <- function(data, z) {
  head(data_modify(data, z))
}
foo(iris, "var_a = Sepal.Width / 10")

new_exp <- c("SW_double = 2 * Sepal.Width", "SW_fraction = SW_double / 10")
foo(iris, new_exp)

datawizard documentation built on Sept. 15, 2023, 9:06 a.m.