bummary: Data summary

View source: R/bummary.R

bummaryR Documentation

Data summary

Description

Improved summary function.

Usage

bummary(data)

Arguments

data

A data table (data.frame or tibble).

Details

This is a small function for making better summary of data, especially for big data sets (many variables/columns). The output is given as a data.frame with one row for each column in the argument data, thus we tolerate many columns. Also, the summary output includes the number of missing data for each variable and its data type. For numerical variables the basic summary statistics are computed, and for factors the count for each factor level is given, in the order of the factor levels.

The argument data is expected to have columns of data types character, integer, numeric, factor or logical. Columns with list are tolerated, but will not be summarized.

Value

A data.frame with one row for each column in the argument data and with the columns:

  • variable. Column name.

  • type. The data type.

  • cases. The number of observations.

  • missing. The number of missing data.

  • minimum. Minimum value (numeric and integer only).

  • quantile_0.25. First quartile (numeric and integer only).

  • median. Median value (numeric and integer only).

  • quantile_0.75. Third quartile (numeric and integer only).

  • maximum. Maximum value (numeric and integer only).

  • mean. Mean value (numeric and integer only).

  • sd. Standard deviation (numeric and integer only).

  • n_categories. The number of categories (factor and logical only).

  • levels. A list-column with factor level counts (factor and logical only).

Author(s)

Lars Snipen.

Examples

library(BIAS.data)
data(colon)
summary.tbl <- bummary(colon)


thoree/stat340 documentation built on June 30, 2024, 4:04 p.m.