get_stats.coin: Statistics of indicators

View source: R/stats.R

get_stats.coinR Documentation

Statistics of indicators

Description

Given a coin and a specified data set (dset), returns a table of statistics with entries for each column.

Usage

## S3 method for class 'coin'
get_stats(
  x,
  dset,
  t_skew = 2,
  t_kurt = 3.5,
  t_avail = 0.65,
  t_zero = 0.5,
  t_unq = 0.5,
  nsignif = 3,
  out2 = "df",
  ...
)

Arguments

x

A coin

dset

A data set present in .$Data

t_skew

Absolute skewness threshold. See details.

t_kurt

Kurtosis threshold. See details.

t_avail

Data availability threshold. See details.

t_zero

A threshold between 0 and 1 for flagging indicators with high proportion of zeroes. See details.

t_unq

A threshold between 0 and 1 for flagging indicators with low proportion of unique values. See details.plot

nsignif

Number of significant figures to round the output table to.

out2

Either "df" (default) to output a data frame of indicator statistics, or "coin" to output an updated coin with the data frame attached under .$Analysis.

...

arguments passed to or from other methods.

Details

The statistics (columns in the output table) are as follows (entries correspond to each column):

  • Min: the minimum

  • Max: the maximum

  • Mean: the (arirthmetic) mean

  • Median: the median

  • Std: the standard deviation

  • Skew: the skew

  • Kurt: the kurtosis

  • N.Avail: the number of non-NA values

  • N.NonZero: the number of non-zero values

  • N.Unique: the number of unique values

  • Frc.Avail: the fraction of non-NA values

  • Frc.NonZero: the fraction of non-zero values

  • Frc.Unique: the fraction of unique values

  • Flag.Avail: a data availability flag - columns with Frc.Avail < t_avail will be flagged as "LOW", else "ok".

  • Flag.NonZero: a flag for columns with a high proportion of zeros. Any columns with Frc.NonZero < t_zero are flagged as "LOW", otherwise "ok".

  • Flag.Unique: a unique value flag - any columns with Frc.Unique < t_unq are flagged as "LOW", otherwise "ok".

  • Flag.SkewKurt: a skew and kurtosis flag which is an indication of possible outliers. Any columns with abs(Skew) > t_skew AND Kurt > t_kurt are flagged as "OUT", otherwise "ok".

The aim of this table, among other things, is to check the basic statistics of each column/indicator, and identify any possible issues for each indicator. For example, low data availability, having a high proportion of zeros and/or a low proportion of unique values. Further, the combination of skew and kurtosis (i.e. the Flag.SkewKurt column) is a simple test for possible outliers, which may require treatment using Treat().

The table can be returned either to the coin or as a standalone data frame - see out2.

See also vignette("analysis").

Value

Either a data frame or updated coin - see out2.

Examples

# build example coin
coin <-  build_example_coin(up_to = "new_coin", quietly = TRUE)

# get table of indicator statistics for raw data set
get_stats(coin, dset = "Raw", out2 = "df")


COINr documentation built on Oct. 9, 2023, 5:07 p.m.