knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
It is very common to want to determine the effects that a given intervention or treatment has on a set of individuals, which is why statistical methods such as hypothesis tests are used, such as hypothesis tests, including the well-known Student's t-test. However, on many occasions the parametric assumptions are not fulfilled and for this reason we resort to the nonparametric field where we find the Wilcoxon signed-rank test, but which can also present inconveniences in the presence of negative correlation where the power of the test is affected, which has led many researchers to seek solutions in other fields of statistics, such as Pereira et al. [-@luz_adriana] who proposed a hypothesis test for paired samples that is framed in the field of Bayesian nonparametric statistics and makes the assumptions of traditional tests more flexible, in addition to comparing distributions in their entirety and not only by location and scale parameters.
In this order of ideas, the BNP.test function was implemented in R, which allows the Bayesian nonparametric hypothesis test to be carried out in a functional way for the user, since only one vector with the observations before the intervention or treatment (x), another vector with the observations after the intervention (y) and finally, the number of simulations that the user wishes to carry out must be entered.
To illustrate how the test works, an example with simulated data and one with real data will be presented. In the case of simulated data, it will be performed from a normal mixture and in the case of real data, the $datasets$ package and the $airquality$ database will be used.
For this dataset, the statistical simulation of a bivariate normal mixture with mean vectors and variance and covariance matrices given by:
$$ (1) \hspace{0.5cm} Y \sim N_2 \left(\begin{array}{cc} 0 \ -3 \end{array}\right) \left(\begin{array}{cc} 1 & 0.8\ 0.8 & 1 \end{array}\right) $$ $$ (2) \hspace{0.5cm} Y \sim N_2 \left(\begin{array}{cc} 0 \ 3 \end{array}\right) \left(\begin{array}{cc} 1 & 0.8\ 0.8 & 1 \end{array}\right) $$ Where a set of n simulated random values of an uniform distribution of parameters $(0,1)$ is taken and if it is less than 0.5 it is sampled from the first distribution and otherwise from the second distribution.
library(BNPPairedSamples) # Package library load library(mvtnorm) # Package to simulate from a bivariate normal distribution #Simulation of the data set.seed(123) y <- matrix(runif(300), ncol = 300) y <- apply(y, 2, function(i) if (i < 0.5) { y <- mvtnorm::rmvnorm(1, mean = c(0, -3), sigma = matrix(c(1,0.8,0.8,1),nrow = 2,byrow = T)) }else{ y <- mvtnorm::rmvnorm(1, mean = c(0,3), sigma = matrix(c(1,0.8,0.8,1),nrow = 2,byrow = T)) }) y <- t(y)
After loading the package library and loading the data, we proceed to run the BNP.test function.
test_results <- BNPPairedSamples::BNP.test(x=y[,1], y=y[,2], n.mcm = 10000)
The BNP.test function returns a list with 3 elements, where the first element corresponds to the parameters obtained by Gibbs sampling, the second element shows the posterior probability of the alternative hypothesis and the third element corresponds to the raw data.
test_results[[1]][1:3] test_results$posterior.probability.H1 head(test_results$data.init)
From the hypothesis test and the marginal distributions it can be said that there are significant differences at the level of shape, scale and location.
The shift function developed by Doksum [-@doksum1] makes it possible to plot the difference between the quantiles of two distributions as a function of the quantiles of one of the groups, in order to quantify the differences between the two study populations.
Taking into account the results of the BNP.test function that indicated significant differences between the marginal distributions, we proceed to run the shift function (which is appropriate to use when the hypothesis test and the graph of the marginal distributions show significant differences) to identify in which range the differences are found.
BNPPairedSamples::plotshift.function(results_BNP = test_results)
It is observed that in the left tail of the distributions the differences are in favor of the distribution at time 1, that is to say that the values of the distribution are more to the right, while for the right tail the differences are in favor of the distribution at time 2 and finally, at 0 there are no significant differences.
The contour plot allows visualizing the joint distribution $f(x,y)$, in addition to being able to identify the possible existence of correlation between the distributions under study, which can be identified according to the slope of the contour plot.
After running the BNP.test function to identify the existence of differences between the marginal distributions and subsequently the shift function to determine in which range such differences existed, we proceed to run the contour plot for the simulated data.
BNPPairedSamples::contours.plot(results_BNP = test_results)
In this case it is observed that the joint distribution is made up of two clusters and that in the vicinity of 0 the density is 0, also noteworthy is the increasing slope of the contours, which would be expressing a positive correlation between the distributions.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.