data_modify: Create new variables in a data frame

View source: R/data_modify.R

data_modifyR Documentation

Create new variables in a data frame

Description

Create new variables or modify existing variables in a data frame. Unlike base::transform(), data_modify() can be used on grouped data frames, and newly created variables can be directly used.

Usage

data_modify(data, ...)

## S3 method for class 'data.frame'
data_modify(data, ..., .if = NULL, .at = NULL, .modify = NULL)

Arguments

data

A data frame

...

One or more expressions that define the new variable name and the values or recoding of those new variables. These expressions can be one of:

  • A sequence of named, literal expressions, where the left-hand side refers to the name of the new variable, while the right-hand side represent the values of the new variable. Example: Sepal.Width = center(Sepal.Width).

  • A sequence of string values, representing expressions.

  • A variable that contains a string representation of the expression. Example:

    a <- "2 * Sepal.Width"
    data_modify(iris, a)
    
  • A character vector of expressions. Example: c("SW_double = 2 * Sepal.Width", "SW_fraction = SW_double / 10"). This type of expression cannot be mixed with other expressions, i.e. if a character vector is provided, you may not add further elements to ....

  • Using NULL as right-hand side removes a variable from the data frame. Example: Petal.Width = NULL.

Note that newly created variables can be used in subsequent expressions, including .at or .if. See also 'Examples'.

.if

A function that returns TRUE for columns in the data frame where .if applies. This argument is used in combination with the .modify argument. Note that only one of .at or .if can be provided, but not both at the same time. Newly created variables in ... can also be selected, see 'Examples'.

.at

A character vector of variable names that should be modified. This argument is used in combination with the .modify argument. Note that only one of .at or .if can be provided, but not both at the same time. Newly created variables in ... can also be selected, see 'Examples'.

.modify

A function that modifies the variables defined in .at or .if. This argument is used in combination with either the .at or the .if argument. Note that the modified variable (i.e. the result from .modify) must be either of length 1 or of same length as the input variable.

Note

data_modify() can also be used inside functions. However, it is recommended to pass the recode-expression as character vector or list of characters.

Examples

data(efc)
new_efc <- data_modify(
  efc,
  c12hour_c = center(c12hour),
  c12hour_z = c12hour_c / sd(c12hour, na.rm = TRUE),
  c12hour_z2 = standardize(c12hour)
)
head(new_efc)

# using strings instead of literal expressions
new_efc <- data_modify(
  efc,
  "c12hour_c = center(c12hour)",
  "c12hour_z = c12hour_c / sd(c12hour, na.rm = TRUE)",
  "c12hour_z2 = standardize(c12hour)"
)
head(new_efc)

# using character strings, provided as variable
stand <- "c12hour_c / sd(c12hour, na.rm = TRUE)"
new_efc <- data_modify(
  efc,
  c12hour_c = center(c12hour),
  c12hour_z = stand
)
head(new_efc)

# providing expressions as character vector
new_exp <- c(
  "c12hour_c = center(c12hour)",
  "c12hour_z = c12hour_c / sd(c12hour, na.rm = TRUE)"
)
new_efc <- data_modify(efc, new_exp)
head(new_efc)

# attributes - in this case, value and variable labels - are preserved
str(new_efc)

# overwrite existing variable, remove old variable
out <- data_modify(iris, Petal.Length = 1 / Sepal.Length, Sepal.Length = NULL)
head(out)

# works on grouped data
grouped_efc <- data_group(efc, "c172code")
new_efc <- data_modify(
  grouped_efc,
  c12hour_c = center(c12hour),
  c12hour_z = c12hour_c / sd(c12hour, na.rm = TRUE),
  c12hour_z2 = standardize(c12hour)
)
head(new_efc)

# works from inside functions
foo <- function(data, z) {
  head(data_modify(data, z))
}
foo(iris, "var_a = Sepal.Width / 10")

new_exp <- c("SW_double = 2 * Sepal.Width", "SW_fraction = SW_double / 10")
foo(iris, new_exp)

# modify at specific positions or if condition is met
d <- iris[1:5, ]
data_modify(d, .at = "Species", .modify = as.numeric)
data_modify(d, .if = is.factor, .modify = as.numeric)

# can be combined with dots
data_modify(d, new_length = Petal.Length * 2, .at = "Species", .modify = as.numeric)

# new variables used in `.at` or `.if`
data_modify(
  d,
  new_length = Petal.Length * 2,
  .at = c("Petal.Length", "new_length"),
  .modify = round
)

# combine "extract_column_names()" and ".at" argument
out <- data_modify(
  d,
  .at = extract_column_names(d, select = starts_with("Sepal")),
  .modify = as.factor
)
# "Sepal.Length" and "Sepal.Width" are now factors
str(out)


datawizard documentation built on Oct. 6, 2024, 1:08 a.m.