missing_summary: Compile missing data summary of data set

View source: R/missing_summary.R

missing_summaryR Documentation

Compile missing data summary of data set

Description

Generates missing data summaries. Adapted from tgsify's missingness_info

Usage

missing_summary(data, upper_limit, max_vars, include_vars, type = "both")

Arguments

data

A data.frame object for which a missing data report will be generated

upper_limit

A number. The right tail of the frequency distribution reported in the type="row" summary is truncated to "upper_limit +".

max_vars

A number. If not specified, will report the variables with missing data. If specified, limits the list of variables reported type="col" to the first max_vars most frequently missing variables. If there are multiple variables with the same number of missing values, all such the variables will be reported. (This means more than max_vars variables can appear in the output). Can specify Inf if all variables are desired, including those without missing data.

include_vars

A vector of variable names or a regular expression to select variables that match a pattern. You may drop variables by providing a regular expression preceded by ! (include_vars = "!qol", for example, would drop variables matched by "qol")

type

Type of output. One of "both", "row", "col", or "complete". The default is "both" for both "row"-wise and "col"-wise missing data summaries. type="complete" will report proportion of records with complete data.

Details

The output is a list with the "by row" summary and the "by column" summary.

Examples

missing_summary(airquality)
missing_summary(airquality, type = "complete")
missing_summary(airquality, type = "row")
missing_summary(airquality, type = "col")
## Include only Ozone and Solar.R variables
missing_summary(airquality, type = "row", include_vars = c("Ozone", "Solar.R"))
missing_summary(airquality, type = "row", include_vars = "Oz|Solar")
missing_summary(airquality, type = "both", include_vars = "Oz|Solar")
missing_summary(airquality, type = "row", upper_limit = 1)
## Below, the upper_limit will provide the same results
missing_summary(airquality, type = "row", upper_limit = 3)
## upper_limit = 6 will not return "N+" like the above example because `airquality` has 6 total variables
missing_summary(airquality, type = "row", upper_limit = 6)
## drop Ozone only
missing_summary(airquality, type = "col", include_vars = "!Ozone")

olsonma/mollr documentation built on Aug. 2, 2022, 9:17 p.m.