desc_stat: Descriptive statistics

View source: R/desc_stat.R

desc_statR Documentation

Descriptive statistics

Description

[Stable]

  • desc_stat() Computes the most used measures of central tendency, position, and dispersion.

  • desc_wider() is useful to put the variables in columns and grouping variables in rows. The table is filled with a statistic chosen with the argument stat.

Usage

desc_stat(
  .data = NULL,
  ...,
  by = NULL,
  stats = "main",
  hist = FALSE,
  level = 0.95,
  digits = 4,
  na.rm = FALSE,
  verbose = TRUE,
  plot_theme = theme_metan()
)

desc_wider(.data, which)

Arguments

.data

The data to be analyzed. It can be a data frame (possible with grouped data passed from dplyr::group_by() or a numeric vector. For desc_wider() .data is an object of class desc_stat.

...

A single variable name or a comma-separated list of unquoted variables names. If no variable is informed, all the numeric variables from .data will be used. Select helpers are allowed.

by

One variable (factor) to compute the function by. It is a shortcut to dplyr::group_by(). To compute the statistics by more than one grouping variable use that function.

stats

The descriptive statistics to show. This is used to filter the output after computation. Defaults to "main" (cv, max, mean median, min, sd.amo, se, ci ). Other allowed values are "all" to show all the statistics, "robust" to show robust statistics, "quantile" to show quantile statistics, or chose one (or more) of the following:

  • "av.dev": average deviation.

  • "ci.t": t-interval (95% confidence interval) of the mean.

  • "ci.z": z-interval (95% confidence interval) of the mean.

  • "cv": coefficient of variation.

  • "iqr": interquartile range.

  • "gmean": geometric mean.

  • "hmean": harmonic mean.

  • "Kurt": kurtosis.

  • "mad": median absolute deviation.

  • "max": maximum value.

  • "mean": arithmetic mean.

  • "median": median.

  • "min": minimum value.

  • "n": the length of the data.

  • "n.valid": The valid (Not NA) number of elements

  • "n.missing": The number of missing values

  • "n.unique": The length of unique elements.

  • "ps": the pseudo-sigma (iqr / 1.35).

  • ⁠"q2.5", "q25", "q75", "q97.5"⁠: the percentile 2.5\ quartile, third quartile, and percentile 97.5\

  • range: The range of data).

  • ⁠"sd.amo", "sd.pop"⁠: the sample and population standard deviation.

  • "se": the standard error of the mean.

  • "skew": skewness.

  • "sum". the sum of the values.

  • "sum.dev": the sum of the absolute deviations.

  • "ave.sq.dev": the average of the squared deviations.

  • "sum.sq.dev": the sum of the squared deviations.

  • "n.valid": The size of sample with valid number (not NA).

  • ⁠"var.amo", "var.pop"⁠: the sample and population variance.

Use a names to select the statistics. For example, stats = c("median, mean, cv, n"). Note that the statistic names are not case-sensitive. Both comma or space can be used as separator.

hist

Logical argument defaults to FALSE. If hist = TRUE then a histogram is created for each selected variable.

level

The confidence level to compute the confidence interval of mean. Defaults to 0.95.

digits

The number of significant digits.

na.rm

Logical. Should missing values be removed? Defaults to FALSE.

verbose

Logical argument. If verbose = FALSE the code is run silently.

plot_theme

The graphical theme of the plot. Default is plot_theme = theme_metan(). For more details, see ggplot2::theme().

which

A statistic to fill the table.

Value

  • desc_stats() returns a tibble with the statistics in the columns and variables (with possible grouping factors) in rows.

  • desc_wider() returns a tibble with variables in columns and grouping factors in rows.

Author(s)

Tiago Olivoto tiagoolivoto@gmail.com

Examples


library(metan)
#===============================================================#
# Example 1: main statistics (coefficient of variation, maximum,#
# mean, median, minimum, sample standard deviation, standard    #
# error and confidence interval of the mean) for all numeric    #
# variables in data                                             #
#===============================================================#

desc_stat(data_ge2)

#===============================================================#
#Example 2: robust statistics using a numeric vector as input   #
# data
#===============================================================#
vect <- data_ge2$TKW
desc_stat(vect, stats = "robust")

#===============================================================#
# Example 3: Select specific statistics. In this example, NAs   #
# are removed before analysis with a warning message            #
#===============================================================#
desc_stat(c(12, 13, 19, 21, 8, NA, 23, NA),
          stats = c('mean, se, cv, n, n.valid'),
          na.rm = TRUE)

#===============================================================#
# Example 4: Select specific variables and compute statistics by#
# levels of a factor variable (GEN)                             #
#===============================================================#
stats <-
  desc_stat(data_ge2,
            EP, EL, EH, ED, PH, CD,
            by = GEN)
stats

# To get a 'wide' format with the maximum values for all variables
desc_wider(stats, max)

#===============================================================#
# Example 5: Compute all statistics for all numeric variables   #
# by two or more factors. Note that group_by() was used to pass #
# grouped data to the function desc_stat()                      #
#===============================================================#

data_ge2 %>%
  group_by(ENV, GEN) %>%
  desc_stat()



TiagoOlivoto/WAASB documentation built on March 20, 2024, 4:18 p.m.