knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(relper) library(dplyr) library(ggplot2) x <- rnorm(100) y <- rexp(100)
calc_
functions compute a certain value.
The goal of calc_acf
is to compute the auto-correlation function,
given by:
$$\frac{\sum_\limits{t = k+1}^{n}(x_t - \bar{x})(x_{t-k} - \bar{x})}{\sum_\limits{t = 1}^{n} (x_t - \bar{x})^2 },$$ where:
calc_acf(x)
If you pass a second vector in the argument y
the cross-correlation
will be computed instead:
$$\frac{n \left( \sum_\limits{t = 1}^{n}x_ty_t \right) - \left[\left(\sum_\limits{t = 1}^{n}x_t \right) \left(\sum_\limits{t = 1}^{n}y_t\right) \right]}{\sqrt{\left[n \left( \sum_\limits{t = 1}^{n}x_t^2 \right) - \left( \sum_\limits{t = 1}^{n}x_t \right)^2\right]\left[n \left( \sum_\limits{t = 1}^{n}y_t^2 \right) - \left( \sum_\limits{t = 1}^{n}y_t \right)^2\right]}},$$ where:
calc_acf(x,y)
The goal of calc_association
is to compute associations metrics.
Contingency is a measure of the degree to which two nominal variables are associated. It has a value between 0 and 1, with 0 indicating no relationship and 1 indicating perfect association, and is calculated as follows:
$$\sqrt{\frac{X^2}{n+X^2}},$$
where:
calc_association(mtcars$am,mtcars$vs,type = "contingency")
Cramér's V is a measure of the degree to which two nominal variables are associated. It has a value between 0 and 1, with 0 indicating no relationship and 1 indicating perfect association, and is calculated as follows:
$$\sqrt{\frac{X^2}{n\min(r-1,c-1)}},$$
where:
calc_association(mtcars$am,mtcars$vs,type = "cramers-v")
Phi is a measure of association between two nominal dichotomous variables that takes into account a marginal table of the variables given by:
| | y = 0 | y = 1 | Total | |-----------|-----------|------------|-----------| | x = 0 | $n_{00}$ | $n_{01}$ | $n_{0.}$ | | x = 1 | $n_{10}$ | $n_{11}$ | $n_{1.}$ | | Total | $n_{.0}$ | $n_{.1}$ | $n$ |
Then the phi coefficient is given by:
$$\frac{n_{11}n_{00} - n_{10}n_{01} }{\sqrt{n_{1.}n_{0.}n_{.1}*n_{.0}}}.$$
calc_association(mtcars$am,mtcars$vs,type = "phi")
The goal of calc_auc
is to compute the area under a curve (AUC).
x <- seq(-3,3,l = 100) y <- dnorm(x)
ggplot(tibble(x = x, y = y), aes(x,y))+ geom_point()+ plt_theme_y()
The function default compute the area considering the range of x
.
#from min to max of x range(x) calc_auc(x,y)
df_auc <- tibble( x = x, y = y) df_auc %>% ggplot(aes(x,y))+ geom_area(alpha = .7, fill = "chocolate2")+ geom_point()+ plt_theme_y()+ annotate("text", x = mean(x),y = mean(y),label = round(calc_auc(x,y),3), fontface = "bold", size = 7)+ geom_vline(xintercept = c(-3,3), linetype = "dashed")+ scale_x_continuous(breaks = -5:5)+ scale_y_continuous(expand = c(0,0))
But you can define the argument limits
to get the AUC of that
respective range.
#from -2 to 2 calc_auc(x,y,limits = c(-2,2))
df_auc %>% ggplot(aes(x,y))+ geom_area(data = df_auc %>% filter(between(x,-2,2)), alpha = .7, fill = "chocolate2")+ geom_point()+ plt_theme_y()+ annotate("text", x = mean(x),y = mean(y),label = round(calc_auc(x,y,limits = c(-2,2)),3), fontface = "bold", size = 7)+ geom_vline(xintercept = c(-2,2), linetype = "dashed")+ scale_x_continuous(breaks = -5:5)+ scale_y_continuous(expand = c(0,0))
#from -1 to 1 calc_auc(x,y,limits = c(-1,1))
df_auc %>% ggplot(aes(x,y))+ geom_area(data = df_auc %>% filter(between(x,-1,1)), alpha = .7, fill = "chocolate2")+ geom_point()+ plt_theme_y()+ annotate("text", x = mean(x),y = mean(y),label = round(calc_auc(x,y,limits = c(-1,1)),3), fontface = "bold", size = 7)+ geom_vline(xintercept = c(-1,1), linetype = "dashed")+ scale_x_continuous(breaks = -5:5)+ scale_y_continuous(expand = c(0,0))
The goal of calc_combination
is to compute the number of combinations/permutations. Given that there are a total of $n$ observations and that $r$ will be chosen.
$$n^r.$$
calc_combination(n = 10,r = 4,order_matter = TRUE,with_repetition = TRUE)
$$\frac{n!}{(n-r)!}.$$
calc_combination(n = 10,r = 4,order_matter = TRUE,with_repetition = FALSE)
$$\frac{(n+r-1)!}{r!(n-1)!}.$$
calc_combination(n = 10,r = 4,order_matter = FALSE,with_repetition = TRUE)
$$\frac{n!}{r!(n-r)!}.$$
calc_combination(n = 10,r = 4,order_matter = FALSE,with_repetition = FALSE)
The goal of calc_correlation
is to compute associations metrics.
The Kendall correlation coefficient, also known as the Kendall's Tau coefficient, measures the relationship between two ranked variables.
Maurice Kendall created it, and it is especially useful for analyzing non-linear relationships or ranked data. The coefficient is calculated by counting the number of concordant pairs (ranks in the same order) and discordant pairs (ranks in opposite order) in the data.
$$\frac{n_c-n_d}{\frac{1}{2}*n(n/1)},$$ where:
calc_correlation(mtcars$hp,mtcars$drat,type = "kendall")
The Pearson correlation coefficient quantifies the linear relationship that exists between two continuous variables. It ranges from -1 to 1, indicating the association's strength and direction.
A value of 1 indicates a perfect positive linear relationship, a value of -1 indicates a perfect negative linear relationship, and a value of 0 indicates no linear relationship.
$$\frac{\sigma_{xy}}{\sigma_x\sigma_y},$$ where:
calc_correlation(mtcars$hp,mtcars$drat,type = "pearson")
The Spearman correlation coefficient assesses the strength and direction of a monotonic relationship between two variables, regardless of whether it is linear or non-linear.
It also has a value between -1 and 1, with 1 representing a perfect monotonic relationship and -1 representing a perfect inverse monotonic relationship. A value of 0 indicates that there is no monotonic relationship.
$$1- \frac{6\sum\limits_{i=1}^{n}d_i^2}{n(n^2-1)},$$
where:
calc_correlation(mtcars$hp,mtcars$drat,type = "spearman")
The goal of calc_cv
is to compute the coefficient of variation (CV),
given by:
$$\frac{s}{\bar{x}},$$ where:
set.seed(123);x <- rexp(n = 100) calc_cv(x)
If you set the argument as_perc
to TRUE
, the CV will be multiplied
by 100.
calc_cv(x,as_perc = TRUE)
The goal of calc_error
is to compute errors metrics.
MAE measures the average absolute difference between the predicted and actual values:
$$\frac{\sum\limits_{i=1}^{n}|X_i-Y_i|}{n}.$$
MAPE measures the average percentage difference between the predicted and actual values relative to the actual values:
$$\frac{\sum\limits_{i=1}^{n}\left|\frac{X_i-Y_i}{X_i}\right|}{n}.$$
MSE measures the average of the squared differences between the predicted and actual values:
$$\frac{\sum\limits_{i=1}^{n}(X_i-Y_i)^2}{n}.$$
RMSE is the square root of the MSE, providing the measure of average prediction error in the same units as the target variable:
$$\sqrt{\text{MSE}}.$$
RMSPE is the square root of the average of the squared percentage differences between the predicted and actual values relative to the actual values:
$$\sqrt{\frac{\sum\limits_{i=1}^{n}\left(\frac{X_i-Y_i}{X_i}\right)^2}{n}}.$$
The goal of calc_kurtosis
is to compute a kurtosis coefficient.
calc_kurtosis(x = x)
The biased kurtosis coefficient, is given by:
$$\frac{\sum\limits_{i=1}^n(x_i - \bar{x})^4}{n*s_x^4},$$
where:
calc_kurtosis(x = x,type = "biased")
The excess kurtosis coefficient, is given by:
$$\frac{\sum\limits_{i=1}^n(x_i - \bar{x})^4}{n*s_x^4}-3,$$
where:
calc_kurtosis(x = x,type = "excess")
The percentile kurtosis coefficient, is given by:
$$\frac{Q_3-Q_1}{P_{90}-P_{10}},$$ where:
calc_kurtosis(x = x,type = "percentile")
The unbiased kurtosis coefficient, is given by:
$$\frac{(n+1)n}{(n-1)(n-2)(n-3)}\frac{\sum\limits_{i=1}^n(x_i - \bar{x})^4}{ns_x^4} - 3\frac{(n-1)^2}{(n-2)*(n-3)},$$
where:
calc_kurtosis(x = x,type = "unbiased")
The goal of calc_mean
is to compute the mean.
$$\frac{1}{n}\sum\limits_{i=1}^{n}x_i,$$ where:
calc_mean(x = 1:10,type = "arithmetic")
$$\frac{1}{\sum\limits_{i=1}^{n}w_i}\sum\limits_{i=1}^{n}w_ix_i,$$ where:
calc_mean(x = 1:10,type = "arithmetic",weight = 1:10)
calc_mean(x = 1:10,type = "arithmetic",trim = .4)
$$\sqrt[n]{\prod\limits_{i=1}^{n}x_i} = \sqrt[n]{x_1\times x_2 \times...\times x_n},$$
where:
calc_mean(x = 1:10,type = "geometric")
$$\frac{n}{\sum\limits_{i=1}^{n}\frac{1}{x_i}},$$ where:
calc_mean(x = 1:10,type = "harmonic")
The goal of calc_modality
is to compute the number of modes.
calc_modality(x = c("a","a","b","b"))
The goal of calc_mode
is to compute the mode.
set.seed(123);cat_var <- sample(letters,100,replace = TRUE) table(cat_var)
We can see that the letter "y" appears the most, indicating that it is the variable's mode.
calc_mode(cat_var)
The goal of calc_peak_density
is to compute the peak density value of
a numeric value.
pd_plot <- ggplot2::ggplot(data = dplyr::tibble(x = x), ggplot2::aes(x))+ ggplot2::geom_density()+ relper::plt_theme_y()+ ggplot2::scale_x_continuous(breaks = 0:20, expand = c(0,0))+ plt_flip_y_title+ labs(y = "Density (x)") pd_plot
Assume we want to know what the density's peak value is.
calc_peak_density(x)
x_peak <- relper::calc_peak_density(x) pd_plot+ ggplot2::geom_vline( xintercept = relper::calc_peak_density(x), col = "royalblue4", size = 1 )+ ggplot2::scale_x_continuous( breaks = 0:20, expand = c(0,0), sec.axis = sec_axis(~., breaks = x_peak, labels = format_num(x_peak,digits = 3) ) )
The goal of calc_perc
is to compute the percentage.
#without main_var calc_perc(mtcars,grp_var = c(cyl,vs)) #main_var within grp_var calc_perc(mtcars,grp_var = c(cyl,vs),main_var = vs) #main_var not within grp_var calc_perc(mtcars,grp_var = c(cyl),main_var = vs)
The goal of calc_skewness
is to compute a skewness coefficient.
calc_skewness(x = x)
Where different types of coefficients are provided, they are:
The Bowley skewness coefficient, is given by:
$$\frac{Q_3+Q_1-2Q_2}{Q_3-Q_1},$$ where:
calc_skewness(x = x,type = "bowley")
The Fisher-Pearson skewness coefficient, is given by:
$$\frac{\sum_\limits{i=1}^{n}(x_i - \bar{x})^3}{n*(s_x)^3},$$
where:
calc_skewness(x = x,type = "fisher_pearson")
The Kelly skewness coefficient, is given by:
$$\frac{P_{90}+P_{10}-2Q_2}{P_{90}-P_{10}},$$ where:
calc_skewness(x = x,type = "kelly")
The Pearson median skewness coefficent, or second skewness coefficient, is given by:
$$\frac{3(\bar{x}- \tilde{x})}{s_x},$$
where:
calc_skewness(x = x,type = "pearson_median")
The Rao skewness coefficient, is given by:
$$\frac{n/(n-1)}{\sqrt{(n-2)/n}},$$
where:
calc_skewness(x = x,type = "rao")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.