Get started

knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE,
  collapse = TRUE,
  comment = "#>"
)
fig_path <- function(file) {
  dir <- knitr::opts_chunk$get("fig.path")
  dir.create(dir, showWarnings = FALSE, recursive = TRUE)
  paste0(dir, file)
}
pkgdown_link <- function(text, file) {
  url <- file.path(
    "https://a-maldet.github.io/labelmachine/articles/labelmachine_files",
    "figure-html", file
  )
  paste0("[", text, "](", url, ")")
}

labelmachine is an R package that helps assigning meaningful labels to R data sets. Manage your labels in yaml files, so called lama-dictionary files. This makes it very easy using the same label translations in multiple projects that share similar data structure.

Labeling your data can be easy!

Installation

# Install release version from CRAN
install.packages("labelmachine")

# Install development version from GitHub
devtools::install_github('a-maldet/labelmachine', build_vignettes = TRUE)

The example data frame

Let us assume, you want to label the following data frame df:

df <- data.frame(
  pupil_id = rep(1:4, each = 3),
  subject = rep(c("eng", "mat", "gym"), 4),
  result = c(1, 2, 2, NA, 2, NA, 1, 0, 1, 2, 3, NA),
  stringsAsFactors = FALSE
)
df

The column subject contains the subject codes the pupils were tested in and result contains the test results.

Your first lama-dictionary

In order to assign labels to the values in the columns of df, we need a lama-dictionary which holds the translations of the variables. With the command new_lama_dictionary() we can create such a lama-dictionary:

library(labelmachine)
dict <- new_lama_dictionary(
  sub = c(eng = "English", mat = "Mathematics", gym = "Gymnastics"),
  res = c(
    "1" = "Good",
    "2" = "Passed",
    "3" = "Not passed",
    "4" = "Not passed",
    NA_ = "Missed",
    "0" = NA
  )
)
dict

Each entry in dict is a translation for a variable (column) of the data frame df. The translation sub can be used to assign labels to the values given in column subject in df and translation res can be used to assign labels to the values in column result in df. The expression NA_ is used to escape the missing value symbol NA. Hence, the last assignment NA_ = "Missed" defines that missing values should be labeled with the string "Missed". For further details on creating lama-dictionaries see Creating lama-dictionaries and Altering lama-dictionaries.

Translate the data frame columns

With the command lama_translate, we can use the lama-dictionary dict in order to translate the variables given in data frame df:

df_new <- lama_translate(
    df,
    dict,
    subject_lab = sub(subject),
    result_lab = res(result)
  )
str(df_new)

As we can see, the resulting data frame df_new now holds two extra columns subject_lab and result_lab holding the factor variables with the right labels. The command lama_translate has multiple features, which are described in more detail in Translating variables.

Save your lama-dictionary

With the command lama_write it is possible to save the lama-dictionary object to a yaml file:

path_to_file <- file.path(tempdir(), "my_dictionary.yaml")
lama_write(dict, path_to_file)

The resulting yaml file is a plain text file with a special text structure, see r pkgdown_link("dictionary.yaml", "dictionary.yaml").

Lama-dictionary files make it easy to share lama-dictionaries with other projects holding similar data structures. With the command lama_read a lama-dictionary file can be read:

path_to_file <- system.file("extdata", "dictionary_exams.yaml", package = "labelmachine")
dict <- lama_read(path_to_file)

Further reading



Try the labelmachine package in your browser

Any scripts or data that you put into this service are public.

labelmachine documentation built on Oct. 11, 2019, 9:05 a.m.