summaryStats: Summary Statistics
In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis

summaryStats

R Documentation

Summary Statistics

Description

Produces a table of summary statistics for the data. If the argument group is missing, calculates a matrix of summary statistics for the data in x. If group is present, the elements of group are interpreted as group labels and the summary statistics are displayed for each group separately.

Usage

summaryStats(x, ...)

## Default S3 method:
summaryStats(
  x,
  group = rep("Data", length(x)),
  data.order = TRUE,
  digits = 2,
  ...
)

## S3 method for class 'formula'
summaryStats(x, data = NULL, data.order = TRUE, digits = 2, ...)

## S3 method for class 'matrix'
summaryStats(x, data.order = TRUE, digits = 2, ...)

Arguments

`x`	either a single vector of values, or a formula of the form data~group, or a matrix.
`...`	Optional arguments which are passed to the summary statistic functions. For example `na.rm = TRUE` will help if there are missing values in the (response) variable.
`group`	a vector of group labels.
`data.order`	if `TRUE`, the group order is the order which the groups are first encountered in the vector 'group'. If `FALSE`, the order is alphabetical.
`digits`	the number of decimal places to display.
`data`	an optional data frame containing the variables in the model.

Value

If x is a single variable, i.e. there are no groups, then a single list is invisibly returned with the following named items:

`min`	Minimum value.
`max`	Maximum value.
`mean`	Mean value.
`var`	Variance – the average of the squares of the deviations of the data values from the sample mean.
`sd`	Standard deviation – the square root of the variance.
`n`	Number of data values – size of the data set.
`nMissing`	If there are missing values, and `na.rm` has been set to `TRUE` then this item will contain the number of missing values.
`iqr`	Midspread (IQR) – the range spanned by central half of data; the interquartile range.
`skewness`	Skewness statistic – indicates how skewed the data set is. Positive values indicate right-skew data. Negative values indicate left-skew data.
`lq`	Lower quartile
`median`	Median – the middle value when the batch is ordered.
`uq`	Upper quartile

If grouping is provided, either by using the group argument, or providing a factor in a formula, or by passing a matrix where the different columns represent the groups, then the function will return a data.frame a row containing all the statistics above for each group.

Methods (by class)

summaryStats(default): Summary Statistics
summaryStats(formula): Summary Statistics
summaryStats(matrix): Summary Statistics

Examples


## STATS20x data:
data(course.df)

## Single variable summary
with(course.df, summaryStats(Exam))

## Using a formula
summaryStats(Exam ~ Stage1, course.df)

## Using a matrix
X = cbind(rnorm(50), rnorm(50))
summaryStats(X)

## Saving and extracting the information
sumStats = summaryStats(Exam ~ Degree, course.df)
sumStats

## Just the BAs
sumStats['BA', ]

## Just the means
sumStats$mean

s20x documentation built on Aug. 21, 2023, 5:07 p.m.