In base R, missing values are indicated using the specific value NA. Regular NAs could be used with any type of vector (double, integer, character, factor, Date, etc.).

Other statistical software have implemented ways to differentiate several types of missing values.

Stata and SAS have a system of tagged NAs, where NA values are tagged with a letter (from a to z). SPSS allows users to indicate that certain non-missing values should be treated in some analysis as missing (user NAs). The haven package implements tagged NAs and user NAs in order to keep this information when importing files from Stata, SAS or SPSS.

library(labelled)

Tagged NAs

Creation and tests

Tagged NAs are proper NA values with a tag attached to them. They can be created with tagged_na(). The attached tag should be a single letter, lowercase (a-z) or uppercase (A-Z).

x <- c(1:5, tagged_na("a"), tagged_na("z"), NA)

For most R functions, tagged NAs are just considered as regular NAs. By default, they are just printed as any other regular NA.

x
is.na(x)

To show/print their tags, you need to use na_tag(), print_tagged_na() or format_tagged_na().

na_tag(x)
print_tagged_na(x)
format_tagged_na(x)

To test if a certain NA is a regular NA or a tagged NA, you should use is_regular_na() or is_tagged_na().

is.na(x)
is_tagged_na(x)
# You can test for specific tagged NAs with the second argument
is_tagged_na(x, "a")
is_regular_na(x)

Tagged NAs could be defined only for double vectors. If you add a tagged NA to a character vector, it will be converted into a regular NA. If you add a tagged NA to an integer vector, the vector will be converted into a double vector.

y <- c("a", "b", tagged_na("z"))
y
is_tagged_na(y)
format_tagged_na(y)

z <- c(1L, 2L, tagged_na("a"))
typeof(z)
format_tagged_na(z)

Unique values, duplicates and sorting with tagged NAs

By default, functions such as base::unique(), base::duplicated(), base::order() or base::sort() will treat tagged NAs as the same thing as a regular NA. You can use unique_tagged_na(), duplicated_tagged_na(), order_tagged_na() and sort_tagged_na() as alternatives that will treat two tagged NAs with different tags as separate values.

x <- c(1, 2, tagged_na("a"), 1, tagged_na("z"), 2, tagged_na("a"), NA)
x %>% print_tagged_na()

unique(x) %>% print_tagged_na()
unique_tagged_na(x) %>% print_tagged_na()

duplicated(x)
duplicated_tagged_na(x)

sort(x, na.last = TRUE) %>% print_tagged_na()
sort_tagged_na(x) %>% print_tagged_na()

Tagged NAs and value labels

It is possible to define value labels for tagged NAs.

x <- c(1, 0, 1, tagged_na("r"), 0, tagged_na("d"), tagged_na("z"), NA)
val_labels(x) <- c(
  no = 0, yes = 1,
  "don't know" = tagged_na("d"),
  refusal = tagged_na("r")
)
x

When converting such labelled vector into factor, tagged NAs are, by default, converted into regular NAs (it is not possible to define tagged NAs with factors).

to_factor(x)

However, the option explicit_tagged_na of to_factor() allows to transform tagged NAs into explicit factor levels.

to_factor(x, explicit_tagged_na = TRUE)
to_factor(x, levels = "prefixed", explicit_tagged_na = TRUE)

Conversion into user NAs

Tagged NAs can be converted into user NAs with tagged_na_to_user_na().

tagged_na_to_user_na(x)
tagged_na_to_user_na(x, user_na_start = 10)

Use tagged_na_to_regular_na() to convert tagged NAs into regular NAs.

tagged_na_to_regular_na(x)
tagged_na_to_regular_na(x) %>% is_tagged_na()

User NAs

haven introduced an haven_labelled_spss class to deal with user defined missing values in a similar way as SPSS. In such case, additional attributes will be used to indicate with values should be considered as missing, but such values will not be stored as internal NA values. You should note that most R function will not take this information into account. Therefore, you will have to convert missing values into NA if required before analysis. These defined missing values could co-exist with internal NA values.

Creation

User NAs could be created directly with labelled_spss(). You can also manipulate them with na_values() and na_range().

v <- labelled(c(1, 2, 3, 9, 1, 3, 2, NA), c(yes = 1, no = 3, "don't know" = 9))
v
na_values(v) <- 9
v

na_values(v) <- NULL
v

na_range(v) <- c(5, Inf)
na_range(v)
v

NB: you cant also use set_na_range() and set_na_values() for a dplyr-like syntax.

library(dplyr)
# setting value labels and user NAs
df <- tibble(s1 = c("M", "M", "F", "F"), s2 = c(1, 1, 2, 9)) %>%
  set_value_labels(s2 = c(yes = 1, no = 2)) %>%
  set_na_values(s2 = 9)
df$s2

# removing user NAs
df <- df %>% set_na_values(s2 = NULL)
df$s2

Tests

Note that is.na() will return TRUE for user NAs. Use is_user_na() to test if a specific value is a user NA and is_regular_na() to test if it is a regular NA.

v
is.na(v)
is_user_na(v)
is_regular_na(v)

Conversion

For most R functions, user NAs values are still regular values.

x <- c(1:5, 11:15)
na_range(x) <- c(10, Inf)
val_labels(x) <- c("dk" = 11, "refused" = 15)
x
mean(x)

You can convert user NAs into regular NAs with user_na_to_na() or user_na_to_regular_na() (both functions are identical).

user_na_to_na(x)
mean(user_na_to_na(x), na.rm = TRUE)

Alternatively, if the vector is numeric, you can convert user NAs into tagged NAs with user_na_to_tagged_na().

user_na_to_tagged_na(x)
mean(user_na_to_tagged_na(x), na.rm = TRUE)

Finally, you can also remove user NAs definition without converting these values to NA, using remove_user_na().

remove_user_na(x)
mean(remove_user_na(x))


larmarange/labelled documentation built on Oct. 11, 2024, 6:25 p.m.