knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE, collapse = TRUE, comment = "#>" ) fig_path <- function(file) { dir <- knitr::opts_chunk$get("fig.path") dir.create(dir, showWarnings = FALSE, recursive = TRUE) paste0(dir, file) } pkgdown_link <- function(text, file) { url <- file.path( "https://a-maldet.github.io/labelmachine/articles/labelmachine_files", "figure-html", file ) paste0("[", text, "](", url, ")") }
labelmachine
is an R package that helps assigning meaningful
labels to R data sets.
Manage your labels in yaml files, so called lama-dictionary files.
This makes it very easy using the same label translations in multiple
projects that share similar data structure.
Labeling your data can be easy!
# Install release version from CRAN install.packages("labelmachine") # Install development version from GitHub devtools::install_github('a-maldet/labelmachine', build_vignettes = TRUE)
Let us assume, you want to label the following data frame df
:
df <- data.frame( pupil_id = rep(1:4, each = 3), subject = rep(c("eng", "mat", "gym"), 4), result = c(1, 2, 2, NA, 2, NA, 1, 0, 1, 2, 3, NA), stringsAsFactors = FALSE ) df
The column subject
contains the subject codes the pupils were tested in and
result
contains the test results.
In order to assign labels to the values in the columns of df
, we need a
lama-dictionary which holds the translations of the variables.
With the command new_lama_dictionary()
we can create such a lama-dictionary:
library(labelmachine) dict <- new_lama_dictionary( sub = c(eng = "English", mat = "Mathematics", gym = "Gymnastics"), res = c( "1" = "Good", "2" = "Passed", "3" = "Not passed", "4" = "Not passed", NA_ = "Missed", "0" = NA ) ) dict
Each entry in dict
is a translation for a variable (column) of the
data frame df
.
The translation sub
can be used to assign labels to the values given in
column subject
in df
and translation res
can be used to assign labels
to the values in column result
in df
.
The expression NA_
is used to escape the missing value symbol NA
.
Hence, the last assignment NA_ = "Missed"
defines that missing values
should be labeled with the string "Missed"
.
For further details on creating lama-dictionaries
see Creating lama-dictionaries and Altering lama-dictionaries.
With the command lama_translate
, we can use the lama-dictionary dict
in order to translate the variables given in data frame df
:
df_new <- lama_translate( df, dict, subject_lab = sub(subject), result_lab = res(result) ) str(df_new)
As we can see, the resulting data frame df_new
now holds two extra columns
subject_lab
and result_lab
holding the factor variables with the right
labels.
The command lama_translate
has multiple features, which are described in
more detail in Translating variables.
With the command lama_write
it is possible to save the lama-dictionary object
to a yaml file:
path_to_file <- file.path(tempdir(), "my_dictionary.yaml") lama_write(dict, path_to_file)
The resulting yaml file is a plain text file with a special text structure,
see r pkgdown_link("dictionary.yaml", "dictionary.yaml")
.
Lama-dictionary files make it easy to share lama-dictionaries with
other projects holding similar data structures. With the command lama_read
a lama-dictionary file can be read:
path_to_file <- system.file("extdata", "dictionary_exams.yaml", package = "labelmachine") dict <- lama_read(path_to_file)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.