desc_stat: Descriptive statistics
In TiagoOlivoto/WAASB: Multi Environment Trials Analysis

desc_stat

R Documentation

Descriptive statistics

Description

desc_stat() Computes the most used measures of central tendency, position, and dispersion.
desc_wider() is useful to put the variables in columns and grouping variables in rows. The table is filled with a statistic chosen with the argument stat.

Usage

desc_stat(
  .data = NULL,
  ...,
  by = NULL,
  stats = "main",
  hist = FALSE,
  level = 0.95,
  digits = 4,
  na.rm = FALSE,
  verbose = TRUE,
  plot_theme = theme_metan()
)

desc_wider(.data, which)

Arguments

`.data`	The data to be analyzed. It can be a data frame (possible with grouped data passed from `dplyr::group_by()` or a numeric vector. For `desc_wider()` `.data` is an object of class `desc_stat`.
`...`	A single variable name or a comma-separated list of unquoted variables names. If no variable is informed, all the numeric variables from `.data` will be used. Select helpers are allowed.
`by`	One variable (factor) to compute the function by. It is a shortcut to `dplyr::group_by()`. To compute the statistics by more than one grouping variable use that function.
`stats`	The descriptive statistics to show. This is used to filter the output after computation. Defaults to `"main"` (cv, max, mean median, min, sd.amo, se, ci ). Other allowed values are `"all"` to show all the statistics, `"robust"` to show robust statistics, `"quantile"` to show quantile statistics, or chose one (or more) of the following: `"av.dev"`: average deviation. `"ci.t"`: t-interval (95% confidence interval) of the mean. `"ci.z"`: z-interval (95% confidence interval) of the mean. `"cv"`: coefficient of variation. `"iqr"`: interquartile range. `"gmean"`: geometric mean. `"hmean"`: harmonic mean. `"Kurt"`: kurtosis. `"mad"`: median absolute deviation. `"max"`: maximum value. `"mean"`: arithmetic mean. `"median"`: median. `"min"`: minimum value. `"n"`: the length of the data. `"n.valid"`: The valid (Not `NA`) number of elements `"n.missing"`: The number of missing values `"n.unique"`: The length of unique elements. `"ps"`: the pseudo-sigma (iqr / 1.35). `⁠"q2.5", "q25", "q75", "q97.5"⁠`: the percentile 2.5\ quartile, third quartile, and percentile 97.5\ `range`: The range of data). `⁠"sd.amo", "sd.pop"⁠`: the sample and population standard deviation. `"se"`: the standard error of the mean. `"skew"`: skewness. `"sum"`. the sum of the values. `"sum.dev"`: the sum of the absolute deviations. `"ave.sq.dev"`: the average of the squared deviations. `"sum.sq.dev"`: the sum of the squared deviations. `"n.valid"`: The size of sample with valid number (not NA). `⁠"var.amo", "var.pop"⁠`: the sample and population variance. Use a names to select the statistics. For example, `stats = c("median, mean, cv, n")`. Note that the statistic names are not case-sensitive. Both comma or space can be used as separator.
`hist`	Logical argument defaults to `FALSE`. If `hist = TRUE` then a histogram is created for each selected variable.
`level`	The confidence level to compute the confidence interval of mean. Defaults to 0.95.
`digits`	The number of significant digits.
`na.rm`	Logical. Should missing values be removed? Defaults to `FALSE`.
`verbose`	Logical argument. If `verbose = FALSE` the code is run silently.
`plot_theme`	The graphical theme of the plot. Default is `plot_theme = theme_metan()`. For more details, see `ggplot2::theme()`.
`which`	A statistic to fill the table.

Value

desc_stats() returns a tibble with the statistics in the columns and variables (with possible grouping factors) in rows.
desc_wider() returns a tibble with variables in columns and grouping factors in rows.

Author(s)

Tiago Olivoto tiagoolivoto@gmail.com

Examples


library(metan)
#===============================================================#
# Example 1: main statistics (coefficient of variation, maximum,#
# mean, median, minimum, sample standard deviation, standard    #
# error and confidence interval of the mean) for all numeric    #
# variables in data                                             #
#===============================================================#

desc_stat(data_ge2)

#===============================================================#
#Example 2: robust statistics using a numeric vector as input   #
# data
#===============================================================#
vect <- data_ge2$TKW
desc_stat(vect, stats = "robust")

#===============================================================#
# Example 3: Select specific statistics. In this example, NAs   #
# are removed before analysis with a warning message            #
#===============================================================#
desc_stat(c(12, 13, 19, 21, 8, NA, 23, NA),
          stats = c('mean, se, cv, n, n.valid'),
          na.rm = TRUE)

#===============================================================#
# Example 4: Select specific variables and compute statistics by#
# levels of a factor variable (GEN)                             #
#===============================================================#
stats <-
  desc_stat(data_ge2,
            EP, EL, EH, ED, PH, CD,
            by = GEN)
stats

# To get a 'wide' format with the maximum values for all variables
desc_wider(stats, max)

#===============================================================#
# Example 5: Compute all statistics for all numeric variables   #
# by two or more factors. Note that group_by() was used to pass #
# grouped data to the function desc_stat()                      #
#===============================================================#

data_ge2 %>%
  group_by(ENV, GEN) %>%
  desc_stat()

TiagoOlivoto/WAASB documentation built on Oct. 19, 2024, 1:20 a.m.