summaryStats: Summary Statistics

View source: R/summaryStats.R

summaryStatsR Documentation

Summary Statistics

Description

Produces a table of summary statistics for the data. If the argument group is missing, calculates a matrix of summary statistics for the data in x. If group is present, the elements of group are interpreted as group labels and the summary statistics are displayed for each group separately.

Usage

summaryStats(x, ...)

## Default S3 method:
summaryStats(
  x,
  group = rep("Data", length(x)),
  data.order = TRUE,
  digits = 2,
  ...
)

## S3 method for class 'formula'
summaryStats(x, data = NULL, data.order = TRUE, digits = 2, ...)

## S3 method for class 'matrix'
summaryStats(x, data.order = TRUE, digits = 2, ...)

Arguments

x

either a single vector of values, or a formula of the form data~group, or a matrix.

...

Optional arguments which are passed to the summary statistic functions. For example na.rm = TRUE will help if there are missing values in the (response) variable.

group

a vector of group labels.

data.order

if TRUE, the group order is the order which the groups are first encountered in the vector 'group'. If FALSE, the order is alphabetical.

digits

the number of decimal places to display.

data

an optional data frame containing the variables in the model.

Value

If x is a single variable, i.e. there are no groups, then a single list is invisibly returned with the following named items:

min

Minimum value.

max

Maximum value.

mean

Mean value.

var

Variance – the average of the squares of the deviations of the data values from the sample mean.

sd

Standard deviation – the square root of the variance.

n

Number of data values – size of the data set.

nMissing

If there are missing values, and na.rm has been set to TRUE then this item will contain the number of missing values.

iqr

Midspread (IQR) – the range spanned by central half of data; the interquartile range.

skewness

Skewness statistic – indicates how skewed the data set is. Positive values indicate right-skew data. Negative values indicate left-skew data.

lq

Lower quartile

median

Median – the middle value when the batch is ordered.

uq

Upper quartile

If grouping is provided, either by using the group argument, or providing a factor in a formula, or by passing a matrix where the different columns represent the groups, then the function will return a data.frame a row containing all the statistics above for each group.

Methods (by class)

  • summaryStats(default): Summary Statistics

  • summaryStats(formula): Summary Statistics

  • summaryStats(matrix): Summary Statistics

Examples


## STATS20x data:
data(course.df)

## Single variable summary
with(course.df, summaryStats(Exam))

## Using a formula
summaryStats(Exam ~ Stage1, course.df)

## Using a matrix
X = cbind(rnorm(50), rnorm(50))
summaryStats(X)

## Saving and extracting the information
sumStats = summaryStats(Exam ~ Degree, course.df)
sumStats

## Just the BAs
sumStats['BA', ]

## Just the means
sumStats$mean


s20x documentation built on Aug. 21, 2023, 5:07 p.m.