Introduction to labelled data

knitr::opts_chunk$set(
  collapse = TRUE,
  message = FALSE, 
  warning = FALSE,
  comment = "#>",
  fig.path = "man/figures/",
  out.width = "100%")

options(tibble.print_min = 5, tibble.print_max = 5)

options(rmarkdown.html_vignette.check_title = FALSE)

What is labelled data in R?

Labelled data in SPSS and Stata refers to datasets where each variable (or column) and its values are assigned meaningful labels. These labels provide context, such as descriptions or categories, making the data easier to understand and analyze. For instance, a variable representing gender might have numerical codes (1, 2) with labels ("Male", "Female"). This feature enhances data analysis by allowing researchers to work with descriptive labels instead of deciphering codes or numeric values, facilitating clearer interpretation and communication of statistical results.

The R ecosystem, through packages like foreign and haven, facilitates the importation of labelled data from software like SPSS and Stata, ensuring a smooth transition into R. The bulkreadr package extends this functionality by leveraging haven to further streamline the process. It automatically converts labelled data into R's factor data type, eliminating the need for manual recoding. This enhancement significantly improves the efficiency of the data analysis workflow within the R environment.

Note

For the majority of functions within this package, we will utilize data stored in the system file by the bulkreadr, which can be accessed using the system.file() function. If you wish to utilize your own data stored in your local directory, please ensure that you have set the appropriate file path prior to using any functions provided by the bulkreadr package.

read_spss_data()

read_spss_data() is designed to seamlessly import data from an SPSS data (.sav or .zsav) files. It converts labelled variables into factors, a crucial step that enhances the ease of data manipulation and analysis within the R programming environment.

Read the SPSS data file without converting variable labels as column names

library(bulkreadr)

file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")

data <- read_spss_data(file = file_path)

data

Read the SPSS data file and convert variable labels as column names

data <- read_spss_data(file = file_path, label = TRUE)

data

read_stata_data()

read_stata_data() reads Stata data file (.dta) into an R data frame, converting labeled variables into factors.

Read the Stata data file without converting variable labels as column names

file_path <- system.file("extdata", "Wages.dta", package = "bulkreadr")

data <- read_stata_data(file = file_path)

data

Read the Stata data file and convert variable labels as column names

data <- read_stata_data(file = file_path, label = TRUE)

data

generate_dictionary()

generate_dictionary() creates a data dictionary from a specified data frame. This function is particularly useful for understanding and documenting the structure of your dataset, similar to data dictionaries in Stata or SPSS.

# Creating a data dictionary from an SPSS file

file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")

wage_data <- read_spss_data(file = file_path)

generate_dictionary(wage_data)

look_for()

The look_for() function is designed to emulate the functionality of the Stata lookfor command in R. It provides a powerful tool for searching through large datasets, specifically targeting variable names, variable label descriptions, factor levels, and value labels. This function is handy for users working with extensive and complex datasets, enabling them to quickly and efficiently locate the variables of interest.

# Look for a single keyword.

look_for(wage_data, "south")

look_for(wage_data, "^s")


Try the bulkreadr package in your browser

Any scripts or data that you put into this service are public.

bulkreadr documentation built on May 29, 2024, 1:35 a.m.