knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(qacr)

Overview

Getting summary statistics for a quantitative variable is a very common task in data analysis. Unfortunately, R makes it surprisingly difficult.

The qstats function is an attempt to rectify the situation by making it simple to get any number of descriptive statistics for a numeric variable and to break these statistics down by the levels of one or more categorical variables (groups).

The general format is

qstats(data, variable, grouping variables, statistics, other options)

Note that variable names do not have to be quoted.

Using default statistics

By default the sample size, mean, and standard deviation are provided. Let's take a look at fuel efficiencies for 11,914 automobiles in the cardata data frame.

# simple summary statistics 
qstats(cardata, highway_mpg)

# summary statistics by vehicle_size
qstats(cardata, highway_mpg, vehicle_size)

# summary statistics by vehicle_size and drive type
qstats(cardata, highway_mpg, vehicle_size, driven_wheels)

Specifying other statistics

You can supply a statistics argument with the "stats" parameter. You can pass a single statistic, or multiple statistics as a vector of names.

# single statistic
qstats(cardata, highway_mpg, vehicle_size, stats = "median")

# multiple statistics
qstats(cardata, highway_mpg, vehicle_size, 
       stats = c("median", "min", "max"))

User-defined functions can also be used as a statistics. The only requirement is that the function returns a single number.

#custom statistics
p25 <- function(x) quantile(x, probs=.25)
p75 <- function(x) quantile(x, probs=.75)

#calling the built in and custom statistics
qstats(cardata, highway_mpg, vehicle_size, 
       stats = c("min", "p25", "p75", "max"))

Other options

Other options include

qstats(cardata, highway_mpg, vehicle_size,  
       stats=c("n", "mean","median","sd"),  
       na.rm=FALSE, digits=2)


Rkabacoff/qacr documentation built on March 20, 2021, 3:03 p.m.