describe_all: describe_all Describe your data cleanly and effectively

Description Usage Arguments Details Value See Also Examples

View source: R/describe_all.R

Description

Describe data sets with multiple variable types effectively.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
describe_all(data, digits = 2, include_NAcat = TRUE, max_levels = 10,
  include_numeric = FALSE, sort_by_freq = FALSE, NAcat_include = NULL,
  ...)

describe_all_num(data, digits = 2, ...)

describe_all_cat(data, digits = 2, include_NAcat = TRUE, max_levels = 10,
  include_numeric = FALSE, sort_by_freq = FALSE)

describeAll(data, digits = 2, include_NAcat = TRUE, max_levels = 10,
  include_numeric = FALSE, sort_by_freq = FALSE, NAcat_include = NULL,
  ...)

Arguments

data

The dataset, of class data.frame.

digits

See round. Default is 2, which for categorical is applied to the proportion (i.e. before converting to percentage).

include_NAcat

Include NA values as categorical levels? Default is TRUE.

max_levels

The maximum number of levels you want to display for categorical variables. Default is 10.

include_numeric

For categorical summary, also include numeric variables with fewer or equal max_levels? Default is FALSE.

sort_by_freq

Sort categorical result by frequency? Default is FALSE.

NAcat_include

Deprecated include_NAcat.

...

Additional arguments passed to num_summary

Details

This function comes out of my frustrations from various data set summaries either being inadequate for my needs, too 'busy' with output, or unable to deal well with mixed data types. Numeric data is treated separately from categorical, and provides the same information as in num_summary. Categorical variables are defined as anything with equal or fewer distinct values than max_levels combined with include_numeric. Categorical variables are summarized with frequencies and percentages. For empty categorical variables (e.g. after a subset), a warning is thrown.

The functions describe_all_num and describe_all_cat will provide only numeric or only categorical data summaries respectively. describeAll is a deprecated alias.

Value

A list with two elements of summaries for numeric and other variables respectively.

See Also

summary num_by num_summary

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(lazerhawk); library(dplyr)
X = data.frame(f1 =gl(2, 1, 20, labels=c('A', 'B')), f2=gl(2, 2, 20, labels=c('X', 'Q')))
X = X %>% mutate(bin1 = rbinom(20, 1, p=.5),
                 logic1 = sample(c(TRUE, FALSE), 20, replace = TRUE),
                 num1 = rnorm(20),
                 num2 = rpois(20, 5),
                 char1 = sample(letters, 20, replace = TRUE))
describeAll(X)

describeAll(data.frame(x=factor(1:7)), digits=5)

mclark--/lazerhawk documentation built on July 17, 2018, 3:11 a.m.