utils_stats | R Documentation |
The following functions compute descriptive statistics by levels of a factor or combination of factors quickly.
cv_by()
For computing coefficient of variation.
max_by()
For computing maximum values.
mean_by()
For computing arithmetic means.
min_by()
For compuing minimum values.
n_by()
For getting the length.
sd_by()
For computing sample standard deviation.
var_by()
For computing sample variance.
sem_by()
For computing standard error of the mean.
Useful functions for descriptive statistics. All of them work
naturally with \%>\%
, handle grouped data and multiple variables (all
numeric variables from .data
by default).
av_dev()
computes the average absolute deviation.
ci_mean_t()
computes the t-interval for the mean.
ci_mean_z()
computes the z-interval for the mean.
cv()
computes the coefficient of variation.
freq_table()
Computes a frequency table for either numeric and
categorical/discrete data. For numeric data, it is possible to define the
number of classes to be generated.
hmean(), gmean()
computes the harmonic and geometric means,
respectively. The harmonic mean is the reciprocal of the arithmetic mean of
the reciprocals. The geometric mean is the nth root of n
products.
kurt()
computes the kurtosis like used in SAS and SPSS.
range_data()
Computes the range of the values.
n_valid()
The valid (not NA
) length of a data.
n_unique()
Number of unique values.
n_missing()
Number of missing values.
row_col_mean(), row_col_sum()
Adds a row with the mean/sum of
each variable and a column with the the mean/sum for each row of the data.
sd_amo(), sd_pop()
Computes sample and populational standard
deviation, respectively.
sem()
computes the standard error of the mean.
skew()
computes the skewness like used in SAS and SPSS.
ave_dev()
computes the average of the absolute deviations.
sum_dev()
computes the sum of the absolute deviations.
sum_sq()
computes the sum of the squared values.
sum_sq_dev()
computes the sum of the squared deviations.
var_amo(), var_pop()
computes sample and populational variance.
desc_stat()
is wrapper function around the above ones and can be
used to compute quickly all these statistics at once.
av_dev(.data, ..., na.rm = FALSE) ci_mean_t(.data, ..., na.rm = FALSE, level = 0.95) ci_mean_z(.data, ..., na.rm = FALSE, level = 0.95) cv(.data, ..., na.rm = FALSE) freq_table(.data, var, k = NULL, digits = 3) freq_hist( table, xlab = NULL, ylab = NULL, fill = "gray", color = "black", ygrid = TRUE ) hmean(.data, ..., na.rm = FALSE) gmean(.data, ..., na.rm = FALSE) kurt(.data, ..., na.rm = FALSE) n_missing(.data, ..., na.rm = FALSE) n_unique(.data, ..., na.rm = FALSE) n_valid(.data, ..., na.rm = FALSE) pseudo_sigma(.data, ..., na.rm = FALSE) range_data(.data, ..., na.rm = FALSE) row_col_mean(.data, na.rm = FALSE) row_col_sum(.data, na.rm = FALSE) sd_amo(.data, ..., na.rm = FALSE) sd_pop(.data, ..., na.rm = FALSE) sem(.data, ..., na.rm = FALSE) skew(.data, ..., na.rm = FALSE) sum_dev(.data, ..., na.rm = FALSE) ave_dev(.data, ..., na.rm = FALSE) sum_sq_dev(.data, ..., na.rm = FALSE) sum_sq(.data, ..., na.rm = FALSE) var_pop(.data, ..., na.rm = FALSE) var_amo(.data, ..., na.rm = FALSE) cv_by(.data, ..., .vars = NULL, na.rm = FALSE) max_by(.data, ..., .vars = NULL, na.rm = FALSE) min_by(.data, ..., .vars = NULL, na.rm = FALSE) means_by(.data, ..., .vars = NULL, na.rm = FALSE) mean_by(.data, ..., .vars = NULL, na.rm = FALSE) n_by(.data, ..., .vars = NULL, na.rm = FALSE) sd_by(.data, ..., .vars = NULL, na.rm = FALSE) var_by(.data, ..., .vars = NULL, na.rm = FALSE) sem_by(.data, ..., .vars = NULL, na.rm = FALSE) sum_by(.data, ..., .vars = NULL, na.rm = FALSE)
.data |
A data frame or a numeric vector. |
... |
The argument depends on the function used.
|
na.rm |
If |
level |
The confidence level for the confidence interval of the mean. Defaults to 0.95. |
var |
The variable to compute the frequency table. See |
k |
The number of classes to be created. See |
digits |
The number of significant figures to show. Defaults to 2. |
table |
A frequency table computed with |
xlab, ylab |
The |
fill, color |
The color to fill the bars and color the border of the bar, respectively. |
ygrid |
Shows a grid line on the |
.vars |
Used to select variables in the |
The function freq_table()
computes a frequency table for either
numerical or categorical variables. If a variable is categorical or
discrete (integer values), the number of classes will be the number of
levels that the variable contains.
If a variable (say, data) is continuous, the number of classes (k) is given by
the square root of the number of samples (n) if n =< 100
or 5 * log10(n)
if n > 100
.
The amplitude (\mjseqnA) of the data is used to define the size of the class (\mjseqnc), given by
\loadmathjax \mjsdeqnc = \fracAn - 1
The lower limit of the first class (LL1) is given by min(data) - c / 2. The upper limit is given by LL1 + c. The limits of the other classes are given in the same way. After the creation of the classes, the absolute and relative frequencies within each class are computed.
Functions *_by()
returns a tbl_df
with the computed statistics by
each level of the factor(s) declared in ...
.
All other functions return a named integer if the input is a data frame or a numeric value if the input is a numeric vector.
freq_table()
Returns a list with the frequency table and the breaks used
for class definition. These breaks can be used to construct an histogram of
the variable.
Tiago Olivoto tiagoolivoto@gmail.com
Ferreira, Daniel Furtado. 2009. Estatistica Basica. 2 ed. Vicosa, MG: UFLA.
library(metan) # means of all numeric variables by ENV mean_by(data_ge2, GEN, ENV) # Coefficient of variation for all numeric variables # by GEN and ENV cv_by(data_ge2, GEN, ENV) # Skewness of a numeric vector set.seed(1) nvec <- rnorm(200, 10, 1) skew(nvec) # Confidence interval 0.95 for the mean # All numeric variables # Grouped by levels of ENV data_ge2 %>% group_by(ENV) %>% ci_mean_t() # standard error of the mean # Variable PH and EH sem(data_ge2, PH, EH) # Frequency table for variable NR data_ge2 %>% freq_table(NR)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.