# summaryStats: Summary Statistics In s20x: Functions for University of Auckland Course STATS 201/208 Data Analysis

## Description

Produces a table of summary statistics for the data. If the argument `group` is missing, calculates a matrix of summary statistics for the data in `x`. If `group` is present, the elements of `group` are interpreted as group labels and the summary statistics are displayed for each group separately.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```summaryStats(x, ...) ## Default S3 method: summaryStats( x, group = rep("Data", length(x)), data.order = TRUE, digits = 2, ... ) ## S3 method for class 'formula' summaryStats(x, data = NULL, data.order = TRUE, digits = 2, ...) ## S3 method for class 'matrix' summaryStats(x, data.order = TRUE, digits = 2, ...) ```

## Arguments

 `x` either a single vector of values, or a formula of the form data~group, or a matrix. `...` Optional arguments which are passed to the summary statistic functions. For example `na.rm = TRUE` will help if there are missing values in the (response) variable. `group` a vector of group labels. `data.order` if `TRUE`, the group order is the order which the groups are first encountered in the vector 'group'. If `FALSE`, the order is alphabetical. `digits` the number of decimal places to display. `data` an optional data frame containing the variables in the model.

## Value

If `x` is a single variable, i.e. there are no groups, then a single list is invisibly returned with the following named items:

 `min` Minimum value. `max` Maximum value. `mean` Mean value. `var` Variance – the average of the squares of the deviations of the data values from the sample mean. `sd` Standard deviation – the square root of the variance. `n` Number of data values – size of the data set. `nMissing` If there are missing values, and `na.rm` has been set to `TRUE` then this item will contain the number of missing values. `iqr` Midspread (IQR) – the range spanned by central half of data; the interquartile range. `skewness` Skewness statistic – indicates how skewed the data set is. Positive values indicate right-skew data. Negative values indicate left-skew data. `lq` Lower quartile `median` Median – the middle value when the batch is ordered. `uq` Upper quartile

If grouping is provided, either by using the `group` argument, or providing a factor in a formula, or by passing a matrix where the different columns represent the groups, then the function will return a `data.frame` a row containing all the statistics above for each group.

## Methods (by class)

• `default`: Summary Statistics

• `formula`: Summary Statistics

• `matrix`: Summary Statistics

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22``` ```## STATS20x data: data(course.df) ## Single variable summary with(course.df, summaryStats(Exam)) ## Using a formula summaryStats(Exam ~ Stage1, course.df) ## Using a matrix X = cbind(rnorm(50), rnorm(50)) summaryStats(X) ## Saving and extracting the information sumStats = summaryStats(Exam ~ Degree, course.df) sumStats ## Just the BAs sumStats['BA', ] ## Just the means sumStats\$mean ```

