The Design Philosophy of Functions in sjmisc"
In sjmisc: Data and Variable Transformation Functions

knitr::opts_chunk$set(
  collapse = TRUE, 
  comment = "#>"
)
if (!requireNamespace("dplyr", quietly = TRUE)) {
  knitr::opts_chunk$set(eval = FALSE)
}
options(max.print = 1000)
suppressPackageStartupMessages(library(sjmisc))

Basically, this package complements the dplyr package in that sjmisc takes over data transformation tasks on variables, like recoding, dichotomizing or grouping variables, setting and replacing missing values, etc. The data transformation functions also support labelled data.

The design of data transformation functions

The design of data transformation functions in this package follows, where appropriate, the tidyverse-approach, with the first argument of a function always being the data (either a data frame or vector), followed by variable names that should be processed by the function. If no variables are specified as argument, the function applies to the complete data that was indicated as first function argument.

The data-argument

A major difference to dplyr-functions like select() or filter() is that the data-argument (the first argument of each function), may either be a data frame or a vector. The returned object for each function equals the type of the data-argument:

If the data-argument is a vector, the function returns a vector.
If the data-argument is a data frame, the function returns a data frame.

library(sjmisc)
data(efc)

# returns a vector
x <- rec(efc$e42dep, rec = "1,2=1; 3,4=2")
str(x)

# returns a data frame
rec(efc, e42dep, rec = "1,2=1; 3,4=2", append = FALSE) %>% head()

This design-choice is mainly due to compatibility- and convenience-reasons. It does not affect the usual "tidyverse-workflow" or when using pipe-chains.

The ...-ellipses-argument

The selection of variables specified in the ...-ellipses-argument is powered by dplyr's select() and tidyselect's select_helpers(). This means, you can use existing functions like : to select a range of variables, or also use tidyselect's select_helpers, like contains() or one_of().

library(dplyr)

# select all variables with "cop" in their names, and also
# the range from c161sex to c175empl
rec(
  efc, contains("cop"), c161sex:c175empl, 
  rec = "0,1=0; else=1", 
  append = FALSE
) %>% head()

# center all variables with "age" in name, variable c12hour
# and all variables from column 19 to 21
center(efc, c12hour, contains("age"), 19:21, append = FALSE) %>% head()

The function-types

There are two types of function designs:

coercing/converting functions

Functions like to_factor() or to_label(), which convert variables into other types or add additional information like variable or value labels as attribute, typically return the complete data frame that was given as first argument without any new variables. The variables specified in the ...-ellipses argument are converted (overwritten), all other variables remain unchanged.

x <- efc[, 3:5]

x %>% str()

to_factor(x, e42dep, e16sex) %>% str()

transformation/recoding functions

Functions like rec() or dicho(), which transform or recode variables, by default add the transformed or recoded variables to the data frame, so they return the new variables and the original data as combined data frame. To return only the transformed and recoded variables specified in the ...-ellipses argument, use argument append = FALSE.

# complete data, including new columns
rec(efc, c82cop1, c83cop2, rec = "1,2=0; 3:4=2", append = TRUE) %>% head()

# only new columns
rec(efc, c82cop1, c83cop2, rec = "1,2=0; 3:4=2", append = FALSE) %>% head()

These variables usually get a suffix, so you can bind these variables as new columns to a data frame, for instance with add_columns(). The function add_columns() is useful if you want to bind/add columns within a pipe-chain to the end of a data frame.

efc %>% 
  rec(c82cop1, c83cop2, rec = "1,2=0; 3:4=2", append = FALSE) %>% 
  add_columns(efc) %>% 
  head()

If append = TRUE and suffix = "", recoded variables will replace (overwrite) existing variables.

# complete data, existing columns c82cop1 and c83cop2 are replaced
rec(efc, c82cop1, c83cop2, rec = "1,2=0; 3:4=2", append = TRUE, suffix = "") %>% head()

sjmisc and dplyr

The functions of sjmisc are designed to work together seamlessly with other packages from the tidyverse, like dplyr. For instance, you can use the functions from sjmisc both within a pipe-worklflow to manipulate data frames, or to create new variables with mutate():

efc %>% 
  select(c82cop1, c83cop2) %>% 
  rec(rec = "1,2=0; 3:4=2") %>% 
  head()

efc %>% 
  select(c82cop1, c83cop2) %>% 
  mutate(
    c82cop1_dicho = rec(c82cop1, rec = "1,2=0; 3:4=2"),
    c83cop2_dicho = rec(c83cop2, rec = "1,2=0; 3:4=2")
  ) %>% 
  head()