Labelled data is a type of dataset where variables or their values contain additional metadata. These datasets, common in tools like SPSS or Stata, allow for better understanding and documentation of variables.
The {haven}
package in R is specifically designed for working with such datasets.
It enhances metadata handling by allowing you to embed variable and value labels directly into your data.
In this vignette, we’ll explore how labelled data works, leveraging key {haven}
package functionalities to extract, inspect, and manipulate labels.
library(nettskjemar) # Replace this with your form ID formid <- 123823 data <- ns_get_data(formid) data #> formid $submission_id #> 1 123823 27685292 #> $created freetext radio #> 1 2023-06-01T20:57:15+02:00 some text 1 #> checkbox.questionnaires checkbox.events #> 1 1 1 #> checkbox.logs dropdown radio_matrix.grants #> 1 0 4 1 #> radio_matrix.lecture radio_matrix.email #> 1 2 2 #> checkbox_matrix.1.IT #> 1 1 #> checkbox_matrix.1.colleague #> 1 1 #> checkbox_matrix.1.admin #> 1 0 #> checkbox_matrix.1.union #> 1 0 #> checkbox_matrix.1.internet #> 1 0 #> checkbox_matrix.2.IT #> 1 0 #> checkbox_matrix.2.colleague #> 1 0 #> checkbox_matrix.2.admin #> 1 1 #> checkbox_matrix.2.union #> 1 0 #> checkbox_matrix.2.internet date time #> 1 0 2023-06-01 12:00 #> datetime number_decimal #> 1 2023-06-12T13:33 4.5 #> number_integer slider attachment_1 #> 1 77 3 sølvi.png #> attachment_2 $answer_time_ms #> 1 74630 #> [ reached 'max' / getOption("max.print") -- omitted 2 rows ]
Here we have our example data, displayed as a standard data.frame. There is nothing particularly special about it. To continue, we also need to download the form codebook, as we need this to add the labels.
cb <- ns_get_codebook(formid) cb #> element_no element_type element_code #> 1 1 HEADING <NA> #> 2 2 TEXT <NA> #> 3 3 IMAGE <NA> #> 4 4 PAGE_BREAK <NA> #> 5 5 QUESTION freetext #> element_text #> 1 Let talk about Nettskjema! This is a subheading! #> 2 <NA> #> 3 <NA> #> 4 <NA> #> 5 This is a question about something super important, where the user can input free text. #> element_desc #> 1 <NA> #> 2 <p>This is some text in the form, not a question but a descriptive text.</p>\n\n<p> </p>\n #> 3 <NA> #> 4 <NA> #> 5 <p>With a description field giving details on what should be answered.</p>\n\n<p> </p>\n #> subelement_seq answer_text answer_code #> 1 NA <NA> <NA> #> 2 NA <NA> <NA> #> 3 NA <NA> <NA> #> 4 NA <NA> <NA> #> 5 NA <NA> <NA> #> answer_seq #> 1 NA #> 2 NA #> 3 NA #> 4 NA #> 5 NA #> [ reached 'max' / getOption("max.print") -- omitted 34 rows ]
To add the labels, we use the ns_add_labels
function, using both the unlabelled data and the codebook.
lab_data <- data |> ns_add_labels(cb) lab_data #> formid $submission_id #> 1 123823 27685292 #> $created freetext radio #> 1 2023-06-01T20:57:15+02:00 some text 1 #> checkbox.questionnaires checkbox.events #> 1 1 1 #> checkbox.logs dropdown radio_matrix.grants #> 1 0 4 1 #> radio_matrix.lecture radio_matrix.email #> 1 2 2 #> checkbox_matrix.1.IT #> 1 1 #> checkbox_matrix.1.colleague #> 1 1 #> checkbox_matrix.1.admin #> 1 0 #> checkbox_matrix.1.union #> 1 0 #> checkbox_matrix.1.internet #> 1 0 #> checkbox_matrix.2.IT #> 1 0 #> checkbox_matrix.2.colleague #> 1 0 #> checkbox_matrix.2.admin #> 1 1 #> checkbox_matrix.2.union #> 1 0 #> checkbox_matrix.2.internet date time #> 1 0 2023-06-01 12:00 #> datetime number_decimal #> 1 2023-06-12T13:33 4.5 #> number_integer slider attachment_1 #> 1 77 3 sølvi.png #> attachment_2 $answer_time_ms #> 1 74630 #> [ reached 'max' / getOption("max.print") -- omitted 2 rows ]
You will notice that this does on the surface look completely normal, with no added extras.
Inspecting the data with the str
function will however expose what lies beneath the surface.
str(data) #> 'data.frame': 3 obs. of 31 variables: #> $ formid : num 123823 123823 123823 #> $ $submission_id : int 27685292 27685302 27685319 #> $ $created : chr "2023-06-01T20:57:15+02:00" "2023-06-01T20:58:33+02:00" "2023-06-01T20:59:50+02:00" #> $ freetext : chr "some text" "another answer" "" #> $ radio : int 1 -1 -1 #> $ checkbox.questionnaires : int 1 0 1 #> $ checkbox.events : int 1 0 1 #> $ checkbox.logs : int 0 1 1 #> $ dropdown : int 4 9 4 #> $ radio_matrix.grants : int 1 3 1 #> $ radio_matrix.lecture : int 2 3 1 #> $ radio_matrix.email : int 2 1 1 #> $ checkbox_matrix.1.IT : int 1 0 0 #> $ checkbox_matrix.1.colleague: int 1 0 1 #> $ checkbox_matrix.1.admin : int 0 0 0 #> $ checkbox_matrix.1.union : int 0 0 0 #> $ checkbox_matrix.1.internet : int 0 1 1 #> $ checkbox_matrix.2.IT : int 0 0 1 #> $ checkbox_matrix.2.colleague: int 0 0 1 #> $ checkbox_matrix.2.admin : int 1 1 1 #> $ checkbox_matrix.2.union : int 0 1 1 #> $ checkbox_matrix.2.internet : int 0 0 0 #> $ date : chr "2023-06-01" "2023-02-07" "2022-09-28" #> $ time : chr "12:00" "14:45" "05:11" #> $ datetime : chr "2023-06-12T13:33" "2024-02-15T08:55" "2022-03-03T07:29" #> $ number_decimal : chr "4.5" "2.2" "10" #> $ number_integer : int 77 45 98 #> $ slider : int 3 1 9 #> $ attachment_1 : chr "sølvi.png" "" "" #> $ attachment_2 : chr "" "marius.jpeg" "" #> $ $answer_time_ms : int 74630 71313 70230 str(lab_data) #> Classes 'ns-data' and 'data.frame': 3 obs. of 31 variables: #> $ formid : num 123823 123823 123823 #> $ $submission_id : int 27685292 27685302 27685319 #> $ $created : chr "2023-06-01T20:57:15+02:00" "2023-06-01T20:58:33+02:00" "2023-06-01T20:59:50+02:00" #> $ freetext : 'character' chr "some text" "another answer" "" #> ..- attr(*, "label")= chr "This is a question about something super important, where the user can input free text." #> ..- attr(*, "ns_type")= chr "QUESTION" #> $ radio : int+lbl [1:3] 1, -1, -1 #> ..@ labels : Named int 1 -1 #> .. ..- attr(*, "names")= chr [1:2] "Very happy!" "Very unhappy!" #> ..@ label : chr "How happy are we with Nettskjema?" #> ..@ ns_type: chr "RADIO" #> $ checkbox.questionnaires : int+lbl [1:3] 1, 0, 1 #> ..@ labels : Named chr "questionnaires" #> .. ..- attr(*, "names")= chr "Questionnaires" #> ..@ label : chr "What do we use it for?:: Questionnaires" #> ..@ ns_type: chr "CHECKBOX" #> $ checkbox.events : int+lbl [1:3] 1, 0, 1 #> ..@ labels : Named chr "events" #> .. ..- attr(*, "names")= chr "Event sign-ups" #> ..@ label : chr "What do we use it for?:: Event sign-ups" #> ..@ ns_type: chr "CHECKBOX" #> $ checkbox.logs : int+lbl [1:3] 0, 1, 1 #> ..@ labels : Named chr "logs" #> .. ..- attr(*, "names")= chr "Data logging" #> ..@ label : chr "What do we use it for?:: Data logging" #> ..@ ns_type: chr "CHECKBOX" #> $ dropdown : int+lbl [1:3] 4, 9, 4 #> ..@ labels : Named int 4 9 #> .. ..- attr(*, "names")= chr [1:2] "UiO" "OsloMet" #> ..@ label : chr "Who is responsible with Nettskjema?" #> ..@ ns_type: chr "SELECT" #> $ radio_matrix.grants : int+lbl [1:3] 1, 3, 1 #> ..@ labels : Named int 1 2 3 #> .. ..- attr(*, "names")= chr [1:3] "yes" "no" "not applicable" #> ..@ label : chr "In the last month I have: written some grant applications" #> ..@ ns_type: chr "MATRIX_RADIO" #> $ radio_matrix.lecture : int+lbl [1:3] 2, 3, 1 #> ..@ labels : Named int 1 2 3 #> .. ..- attr(*, "names")= chr [1:3] "yes" "no" "not applicable" #> ..@ label : chr "In the last month I have: held a lecture" #> ..@ ns_type: chr "MATRIX_RADIO" #> $ radio_matrix.email : int+lbl [1:3] 2, 1, 1 #> ..@ labels : Named int 1 2 3 #> .. ..- attr(*, "names")= chr [1:3] "yes" "no" "not applicable" #> ..@ label : chr "In the last month I have: sent some e-mails" #> ..@ ns_type: chr "MATRIX_RADIO" #> $ checkbox_matrix.1.IT : int+lbl [1:3] 1, 0, 0 #> ..@ labels : Named chr "IT" #> .. ..- attr(*, "names")= chr "IT" #> ..@ label : chr "In the last year, I have :: sought help from :: IT" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ checkbox_matrix.1.colleague: int+lbl [1:3] 1, 0, 1 #> ..@ labels : Named chr "colleague" #> .. ..- attr(*, "names")= chr "A colleague" #> ..@ label : chr "In the last year, I have :: sought help from :: A colleague" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ checkbox_matrix.1.admin : int+lbl [1:3] 0, 0, 0 #> ..@ labels : Named chr "admin" #> .. ..- attr(*, "names")= chr "Administration" #> ..@ label : chr "In the last year, I have :: sought help from :: Administration" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ checkbox_matrix.1.union : int+lbl [1:3] 0, 0, 0 #> ..@ labels : Named chr "union" #> .. ..- attr(*, "names")= chr "Union" #> ..@ label : chr "In the last year, I have :: sought help from :: Union" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ checkbox_matrix.1.internet : int+lbl [1:3] 0, 1, 1 #> ..@ labels : Named chr "internet" #> .. ..- attr(*, "names")= chr "Internet" #> ..@ label : chr "In the last year, I have :: sought help from :: Internet" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ checkbox_matrix.2.IT : int+lbl [1:3] 0, 0, 1 #> ..@ labels : Named chr "IT" #> .. ..- attr(*, "names")= chr "IT" #> ..@ label : chr "In the last year, I have :: received e-mails from :: IT" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ checkbox_matrix.2.colleague: int+lbl [1:3] 0, 0, 1 #> ..@ labels : Named chr "colleague" #> .. ..- attr(*, "names")= chr "A colleague" #> ..@ label : chr "In the last year, I have :: received e-mails from :: A colleague" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ checkbox_matrix.2.admin : int+lbl [1:3] 1, 1, 1 #> ..@ labels : Named chr "admin" #> .. ..- attr(*, "names")= chr "Administration" #> ..@ label : chr "In the last year, I have :: received e-mails from :: Administration" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ checkbox_matrix.2.union : int+lbl [1:3] 0, 1, 1 #> ..@ labels : Named chr "union" #> .. ..- attr(*, "names")= chr "Union" #> ..@ label : chr "In the last year, I have :: received e-mails from :: Union" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ checkbox_matrix.2.internet : int+lbl [1:3] 0, 0, 0 #> ..@ labels : Named chr "internet" #> .. ..- attr(*, "names")= chr "Internet" #> ..@ label : chr "In the last year, I have :: received e-mails from :: Internet" #> ..@ ns_type: chr "MATRIX_CHECKBOX" #> $ date : 'character' chr "2023-06-01" "2023-02-07" "2022-09-28" #> ..- attr(*, "label")= chr "Choose a random date" #> ..- attr(*, "ns_type")= chr "DATE" #> $ time : 'character' chr "12:00" "14:45" "05:11" #> ..- attr(*, "label")= chr "now choose a random time!" #> ..- attr(*, "ns_type")= chr "DATE" #> $ datetime : 'character' chr "2023-06-12T13:33" "2024-02-15T08:55" "2022-03-03T07:29" #> ..- attr(*, "label")= chr "Lastly choose a date AND time!" #> ..- attr(*, "ns_type")= chr "DATE" #> $ number_decimal : 'numeric' chr "4.5" "2.2" "10" #> ..- attr(*, "label")= chr "Pick a number between 0 and 10!" #> ..- attr(*, "ns_type")= chr "NUMBER" #> $ number_integer : int 77 45 98 #> ..- attr(*, "label")= chr "Choose an integer between 0 and 100" #> ..- attr(*, "ns_type")= chr "NUMBER" #> $ slider : int 3 1 9 #> ..- attr(*, "label")= chr "Choose a point on the slider!" #> ..- attr(*, "ns_type")= chr "LINEAR_SCALE" #> $ attachment_1 : 'character' chr "sølvi.png" "" "" #> ..- attr(*, "label")= chr "Upload a fun image!" #> ..- attr(*, "ns_type")= chr "ATTACHMENT" #> $ attachment_2 : 'character' chr "" "marius.jpeg" "" #> ..- attr(*, "label")= chr "This is an attachment2" #> ..- attr(*, "ns_type")= chr "ATTACHMENT" #> $ $answer_time_ms : int 74630 71313 70230
You can see there are lots of label attributes attached to lab_data
that are not there in the data
object.
These labels are attached from the codebook, and provide important context to what the data source actually is.
These hidden features of the data are unlocked when working with functions from the {haven} package.
Notice how the metadata (variable and value labels) are now embedded in the dataset.
{haven}
Once your data has been labelled, the {haven}
package provides functionalities to inspect and manipulate labels with ease.
Use var_label()
to extract variable-level labels and val_labels()
to extract value-level labels:
library(labelled)
# Variable labels var_label(lab_data$freetext) #> [1] "This is a question about something super important, where the user can input free text."
# Value labels for 'radio' val_labels(lab_data$radio) #> Very happy! Very unhappy! #> 1 -1
If you need to modify labels, we suggest you do this directly in the Nettskjema codebook setup.
However, if you are working on a form that is no longer available in Nettskjema, and you have downloaded and saved both the data and the codebook (or the labelled data), labels can be modified using {haven}
:
lab_data$freetex #> [1] "some text" "another answer" #> [3] "" #> attr(,"label") #> [1] "This is a question about something super important, where the user can input free text." #> attr(,"ns_type") #> [1] "QUESTION" #> attr(,"class") #> [1] "character" # Update variable-level label for 'freetext' var_label(lab_data$freetext) <- "Important freetext comment" lab_data$radio #> <labelled<integer>[3]>: How happy are we with Nettskjema? #> [1] 1 -1 -1 #> #> Labels: #> value label #> 1 Very happy! #> -1 Very unhappy! # Update value labels for 'radio' val_labels(lab_data$radio) <- c(Unhappy = -1, Happy = 1) # Check updated labels var_label(lab_data$freetext) #> [1] "Important freetext comment" val_labels(lab_data$radio) #> Unhappy Happy #> -1 1
Some key benefits include the following: 1. Enhanced Documentation: Embedding metadata directly in the dataset improves clarity. 2. Consistency: Reduces ambiguity when working across different teams or systems. 3. Compatibility: Facilitates interoperability with SPSS, Stata, and other statistical software.
For more information, check out the labelled package documentation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.