knitr::opts_chunk$set(
  collapse = TRUE,
  error = TRUE,
  comment = "#>"
)
library(rubix)
data("mtcars")

Introduction

Summarize_ functions rapidly survey the depth and breadth of a dataframe. The general use cases for this function family is to analyze data for variability from multiple angles to bucket values into broader categories.

summarize_variables(mtcars,
                    incl_num_calc = FALSE)

Summary functions include:
1. Total and distinct counts
2. Counts for NA (ie. ), NA strings (ie "NA"), and blank values that when combined, can provide an overview of missingness in the dataframe
3. Unique values found within the particular variable in both a pipe-separated string .

Summarize Numeric Variables

Additional metrics can be derived from variables that contain numeric data. The summarize_variables() function either takes variables as arguments or selects for variables of numeric, integer, or double R classes and calculates summary statistics with both na.rm = FALSE (all _NA suffixed outputs) and na.rm = TRUE parameters.

summarize_variables(data = mtcars,
                    incl_num_calc = TRUE)

Summarizing Variable Values

The value_count() function returns all counts for the unique values for each variable.

value_count(data = mtcars)


meerapatelmd/rubix documentation built on Sept. 5, 2021, 8:30 p.m.