Introduction to labelled

The purpose of the labelled package is to provide functions to manipulate metadata as variable labels, value labels and defined missing values using the haven_labelled and haven_labelled_spss classes introduced in haven package.

Variable labels

A variable label could be specified for any vector using var_label.


var_label(iris$Sepal.Length) <- "Length of sepal"

It's possible to add a variable label to several columns of a data frame using a named list.

var_label(iris) <- list(Petal.Length = "Length of petal", Petal.Width = "Width of Petal")

To get the variable label, simply call var_label.


To remove a variable label, use NULL.

var_label(iris$Sepal.Length) <- NULL

In RStudio, variable labels will be displayed in data viewer.


Value labels

The first way to create a labelled vector is to use the labelled function. It's not mandatory to provide a label for each value observed in your vector. You can also provide a label for values not observed.

v <- labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, no = 3, "don't know" = 8, refused = 9))

Use val_labels to get all value labels and val_label to get the value label associated with a specific value.

val_label(v, 8)

val_labels could also be used to modify all the value labels attached to a vector, while val_label will update only one specific value label.

val_labels(v) <- c(yes = 1, nno = 3, bug = 5)
val_label(v, 3) <- "no"

With val_label, you can also add or remove specific value labels.

val_label(v, 2) <- "maybe"
val_label(v, 5) <- NULL

To remove all value labels, use val_labels and NULL. The labelled class will also be removed.

val_labels(v) <- NULL

Adding a value label to a non labelled vector will apply labelled class to it.

val_label(v, 1) <- "yes"

Note that applying val_labels to a factor will have no effect!

f <- factor(1:3)
val_labels(f) <- c(yes = 1, no = 3)

You could also apply value labels to several columns of a data frame.

df <- data.frame(v1 = 1:3, v2 = c(2, 3, 1), v3 = 3:1)

val_label(df, 1) <- "yes"
val_label(df[, c("v1", "v3")], 2) <- "maybe"
val_label(df[, c("v2", "v3")], 3) <- "no"

val_labels(df[, c("v1", "v3")]) <- c(YES = 1, MAYBE = 2, NO = 3)
val_labels(df) <- NULL
val_labels(df) <- list(v1 = c(yes = 1, no = 3), v2 = c(a = 1, b = 2, c = 3))

Sorting value labels

Value labels are sorted by default in the order they have been created.

v <- c(1,2,2,2,3,9,1,3,2,NA)
val_label(v, 1) <- "yes"
val_label(v, 3) <- "no"
val_label(v, 9) <- "refused"
val_label(v, 2) <- "maybe"
val_label(v, 8) <- "don't know"

It could be useful to reorder the value labels according to their attached values.

sort_val_labels(v, decreasing = TRUE)

If you prefer, you can also sort them according to the labels.

sort_val_labels(v, according_to = "l")

User defined missing values (SPSS's style)

haven (>= 2.0.0) introduced an additional haven_labelled_spss class to deal with user defined missing values. In such case, additional atributes will be used to indicate with values should be considered as missing, but such values will not be stored as internal NA values. You should note that most R function will not take this information into account. Therefore, you will have to convert missing values into NA if required before analysis. These defined missing values could co-exist with internal NA values.

It is possible to manipulate this missing values with na_values() and na_range().

v <- labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, no = 3, "don't know" = 9))
na_values(v) <- 9
na_values(v) <- NULL
na_range(v) <- c(5, Inf)

It is mandatory to defined at least one value label before defining missing values. Therefore, the following example will fail.

x <- c(1, 2, 2, 9)
na_values(x) <- 9

You should try:

x <- c(1, 2, 2, 9)
val_labels(x) <- c(yes = 1, no = 2)
na_values(x) <- 9

To convert user defined missing values into NA, simply use user_na_to_na().

v <- labelled_spss(1:10, c(Good = 1, Bad = 8), na_values = c(9, 10))
v2 <- user_na_to_na(v)

You can also remove user missing values definition without converting these values to NA.

v <- labelled_spss(1:10, c(Good = 1, Bad = 8), na_values = c(9, 10))
v2 <- remove_user_na(v)


v <- labelled_spss(1:10, c(Good = 1, Bad = 8), na_values = c(9, 10))
na_values(v) <- NULL

Other conversion to NA

In some cases, values who don't have an attached value label could be considered as missing. nolabel_to_na will convert them to NA.

v <- labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, maybe = 2, no = 3))

In other cases, a value label is attached only to specific values that corresponds to a missing value. For example:

size <- labelled(c(1.88, 1.62, 1.78, 99, 1.91), c("not measured" = 99))

In such cases, val_labels_to_na could be appropriate.


These two functions could also be applied to an overall data frame. Only labelled vectors will be impacted.

Converting to factor

A labelled vector could easily be converted to a factor with to_factor.

v <- labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, no = 3, "don't know" = 8, refused = 9))

The levels argument allows to specify what should be used as the factor levels, i.e. the labels (default), the values or the labels prefixed with values.

to_factor(v, levels = "v")
to_factor(v, levels = "p")

The ordered argument will create an ordinal factor.

to_factor(v, ordered = TRUE)

The argument nolabel_to_na specify if the corresponding function should be applied before converting to a factor. Therefore, the two following commands are equivalent.

to_factor(v, nolabel_to_na = TRUE)

sort_levels specifies how the levels should be sorted: "none" to keep the order in which value labels have been defined, "values" to order the levels according to the values and "labels" according to the labels. "auto" (default) will be equivalent to "none" except if some values with no attached labels are found and are not dropped. In that case, "values" will be used.

to_factor(v, sort_levels = "n")
to_factor(v, sort_levels = "v")
to_factor(v, sort_levels = "l")

The function to_labelled could be used to turn a factor into a labelled numeric vector.

f <- factor(1:3, labels = c("a", "b", "c"))

Note that to_labelled(to_factor(v)) will not be equal to v due to the way factors are stored internally by R.


Importing labelled data

In haven package, read_spss, read_stata and read_sas are natively importing data using the labelled class and the label attribute for variable labels.

Functions from foreign package could also import some metadata from SPSS and Stata files. to_labelled can convert data imported with foreign into a labelled data frame. However, there are some limitations compared to using haven:

The memisc package provide functions to import variable metadata and store them in specific object of class data.set. The to_labelled method can convert a data.set into a labelled data frame.

  # from foreign
  df <- to_labelled(read.spss(
    "file.sav", = FALSE,
    use.value.labels = FALSE,
    use.missings = FALSE
 df <- to_labelled(read.dta(
   convert.factors = FALSE

 # from memisc
 nes1948.por <- UnZip("anes/NES1948.ZIP", "NES1948.POR", package="memisc")
 nes1948 <- spss.portable.file(nes1948.por)
 df <- to_labelled(nes1948)
 ds <-
 df <- to_labelled(ds)

Using labelled with dplyr/magrittr

If you are using the %>% operator, you can use the functions set_variable_labels, set_value_labels, add_value_labels and remove_value_labels.


df <- data_frame(s1 = c("M", "M", "F"), s2 = c(1, 1, 2)) %>% 
  set_variable_labels(s1 = "Sex", s2 = "Question") %>%
  set_value_labels(s1 = c(Male = "M", Female = "F"), s2 = c(Yes = 1, No = 2))

set_value_labels will replace the list of value labels while add_value_labels will update it.

df <- df %>%
  set_value_labels(s2 = c(Yes = 1, "Don't know" = 8, Unknown = 9))

df <- df %>%
  add_value_labels(s2 = c(No = 2))

You can also remove some variable and/or value labels.

df <- df %>%
  set_variable_labels(s1 = NULL)

# removing one value label
df <- df %>%
  remove_value_labels(s2 = 2)

# removing several value labels
df <- df %>%
  remove_value_labels(s2 = 8:9)

# removing all value labels
df <- df %>%
  set_value_labels(s2 = NULL)

To convert variables, you can use functions as mutate_if or mutate_at. See the example below.

glimpse(women %>% mutate_if(is.labelled, to_factor))
glimpse(women %>% mutate_at(vars(employed:tv), to_factor))

Try the labelled package in your browser

Any scripts or data that you put into this service are public.

labelled documentation built on Nov. 25, 2018, 5:04 p.m.