It is a common need to easily get a description of all variables in a data frame.
When a data frame is converted into a tibble (e.g. with dplyr::as_tibble()
), it as a nice printing showing the first rows of the data frame as well as the type of column.
library(dplyr)
iris %>% as_tibble()
However, when you have too many variables, all of them cannot be printed and their are just listed.
data(fertility, package = "questionr") women
Note: in R console, value labels (if defined) are usually printed but they do not appear in a R markdown document like this vignette.
dplyr::glimpse()
The function dplyr::glimpse()
allows you to have a quick look at all the variables in a data frame.
glimpse(iris) glimpse(women)
It will show you the first values of each variable as well as the type of each variable. However, some important informations are not displayed:
labelled::look_for()
look_for()
provided by the labelled
package will print in the console a data dictionary of all variables, showing variable labels when available, the type of variable and a list of values corresponding to:
details = "full"
).library(labelled) look_for(iris) look_for(women)
Note that lookfor()
and generate_dictionary()
are synonyms of look_for()
and works exactly in the same way.
If there is not enough space to print full labels in the console, they will be truncated (truncation is indicated by a ~
).
When a data frame has dozens or even hundreds of variables, it could become difficult to find a specific variable. In such case, you can provide an optional list of keywords, which can be simple character strings or regular expression, to search for specific variables.
# Look for a single keyword. look_for(iris, "petal") look_for(iris, "s") # Look for with a regular expression look_for(iris, "petal|species") look_for(iris, "s$") # Look for with several keywords look_for(iris, "pet", "sp") # Look_for will take variable labels into account look_for(women, "read", "level")
By default, look_for()
will look through both variable names and variables labels. Use labels = FALSE
to look only through variable names.
look_for(women, "read") look_for(women, "read", labels = FALSE)
Similarly, the search is by default case insensitive. To make the search case sensitive, use ignore.case = FALSE
.
look_for(iris, "sepal") look_for(iris, "sepal", ignore.case = FALSE)
If you just want to use the search feature of look_for()
without computing the details of each variable, simply indicate details = "none"
or details = FALSE
.
look_for(women, "id", details = "none")
If you want more details (but can be time consuming for big data frames), indicate details = "full"
or details = TRUE
.
look_for(women, details = "full") look_for(women, details = "full") %>% dplyr::glimpse()
look_for()
look_for()
returns a detailed tibble which is summarized before printing. To deactivate default printing and see full results, simply use dplyr::as_tibble()
, dplyr::glimpse()
or even utils::View()
.
look_for(women) %>% View()
look_for(women) %>% as_tibble() glimpse(look_for(women))
The tibble returned by look_for()
could be easily manipulated for advanced programming.
When a column has several values for one variable (e.g. levels
or value_labels
), results as stored with nested named list. You can convert named lists into simpler character vectors, you can use convert_list_columns_to_character()
.
look_for(women) %>% convert_list_columns_to_character()
Alternatively, you can use lookfor_to_long_format()
to transform results into a long format with one row per factor level and per value label.
look_for(women) %>% lookfor_to_long_format()
Both can be combined:
look_for(women) %>% lookfor_to_long_format() %>% convert_list_columns_to_character()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.