create_summary_statistics: Create Summary Statistics for Specified Variables
In tidyfinance: Tidy Finance Helper Functions

View source: R/create_summary_statistics.R

create_summary_statistics

R Documentation

Create Summary Statistics for Specified Variables

Description

Computes a set of summary statistics for numeric and integer variables in a data frame. It allows users to select specific variables for summarization and can calculate statistics for the whole dataset or within groups specified by the by argument. Additional detail levels for quantiles can be included.

Usage

create_summary_statistics(
  data,
  ...,
  by = NULL,
  detail = FALSE,
  drop_na = FALSE
)

Arguments

`data`	A data frame containing the variables to be summarized.
`...`	Comma-separated list of unquoted variable names in the data frame to summarize. These variables must be either numeric, integer, or logical.
`by`	An optional unquoted variable name to group the data before summarizing. If `NULL` (the default), summary statistics are computed across all observations.
`detail`	A logical flag indicating whether to compute detailed summary statistics, including additional quantiles. Defaults to `FALSE`, which computes basic statistics (n, mean, sd, min, median, max). When `TRUE`, additional quantiles (1%, 5%, 10%, 25%, 75%, 90%, 95%, 99%) are computed.
`drop_na`	A logical flag indicating whether to drop missing values for each variable (default is `FALSE`).

Details

The function first checks that all specified variables are of type numeric, integer, or logical. If any variables do not meet this criterion, the function stops and returns an error message indicating the non-conforming variables.

The basic set of summary statistics includes the count of non-NA values (n), mean, standard deviation (sd), minimum (min), median (q50), and maximum (max). If detail is TRUE, the function also computes the 1st, 5th, 10th, 25th, 75th, 90th, 95th, and 99th percentiles.

Summary statistics are computed for each variable specified in .... If a by variable is provided, statistics are computed within each level of the by variable.

Value

A tibble with summary statistics for each selected variable. If by is specified, the output includes the grouping variable as well. Each row represents a variable (and a group if by is used), and each column contains the computed statistics.

Examples

data <- data.frame(
  ret = c(0.01, -0.02, 0.03, NA, 0.005),
  size = c(100, 200, 150, 300, 250),
  group = c("A", "A", "B", "B", "A")
)

# Basic summary across all observations
create_summary_statistics(data, ret, size)

# Grouped summary
create_summary_statistics(data, ret, size, by = group)

# Detailed quantiles
create_summary_statistics(data, ret, detail = TRUE)

tidyfinance documentation built on July 3, 2026, 1:09 a.m.