Generate a data dictionnary and search for variables with `look_for()`"

Showing a summary of a data frame

Default printing of tibbles

It is a common need to easily get a description of all variables in a data frame.

When a data frame is converted into a tibble (e.g. with dplyr::as_tibble()), it as a nice printing showing the first rows of the data frame as well as the type of column.

library(dplyr)
iris %>% as_tibble()

However, when you have too many variables, all of them cannot be printed and their are just listed.

data(fertility, package = "questionr")
women

Note: in R console, value labels (if defined) are usually printed but they do not appear in a R markdown document like this vignette.

dplyr::glimpse()

The function dplyr::glimpse() allows you to have a quick look at all the variables in a data frame.

glimpse(iris)
glimpse(women)

It will show you the first values of each variable as well as the type of each variable. However, some important informations are not displayed:

labelled::look_for()

look_for() provided by the labelled package will print in the console a data dictionary of all variables, showing variable labels when available, the type of variable and a list of values corresponding to:

library(labelled)
look_for(iris)
look_for(women)

Note that lookfor() and generate_dictionary() are synonyms of look_for() and works exactly in the same way.

If there is not enough space to print full labels in the console, they will be truncated (truncation is indicated by a ~).

Searching variables by key

When a data frame has dozens or even hundreds of variables, it could become difficult to find a specific variable. In such case, you can provide an optional list of keywords, which can be simple character strings or regular expression, to search for specific variables.

# Look for a single keyword.
look_for(iris, "petal")
look_for(iris, "s")

# Look for with a regular expression
look_for(iris, "petal|species")
look_for(iris, "s$")

# Look for with several keywords
look_for(iris, "pet", "sp")

# Look_for will take variable labels into account
look_for(women, "read", "level")

By default, look_for() will look through both variable names and variables labels. Use labels = FALSE to look only through variable names.

look_for(women, "read")
look_for(women, "read", labels = FALSE)

Similarly, the search is by default case insensitive. To make the search case sensitive, use ignore.case = FALSE.

look_for(iris, "sepal")
look_for(iris, "sepal", ignore.case = FALSE)

Level of details

If you just want to use the search feature of look_for() without computing the details of each variable, simply indicate details = "none" or details = FALSE.

look_for(women, "id", details = "none")

If you want more details (but can be time consuming for big data frames), indicate details = "full" or details = TRUE.

look_for(women, details = "full")
look_for(women, details = "full") %>%
  dplyr::glimpse()

Advanced usages of look_for()

look_for() returns a detailed tibble which is summarized before printing. To deactivate default printing and see full results, simply use dplyr::as_tibble(), dplyr::glimpse() or even utils::View().

look_for(women) %>% View()
look_for(women) %>% as_tibble()
glimpse(look_for(women))

The tibble returned by look_for() could be easily manipulated for advanced programming.

When a column has several values for one variable (e.g. levels or value_labels), results as stored with nested named list. You can convert named lists into simpler character vectors, you can use convert_list_columns_to_character().

look_for(women) %>% convert_list_columns_to_character()

Alternatively, you can use lookfor_to_long_format() to transform results into a long format with one row per factor level and per value label.

look_for(women) %>% lookfor_to_long_format()

Both can be combined:

look_for(women) %>%
  lookfor_to_long_format() %>%
  convert_list_columns_to_character()


Try the labelled package in your browser

Any scripts or data that you put into this service are public.

labelled documentation built on July 9, 2023, 7:53 p.m.