Description Usage Arguments Details Value Note Author(s) See Also Examples
Test the independence between two continuous variables based on the maximum distance between the joint empirical cumulative distribution function and the product of the marginal empirical cumulative distribution functions.
1 2 3 4 5 6 7 8 |
x, y |
the two continuous variables. Must be of same length. |
N |
the number of Monte-Carlo replications if simu=TRUE. |
simu |
if TRUE a Monte-Carlo simulation with |
ties.break |
the method used to break ties in case there are ties in the x or y vectors. Can be |
nb_tiebreak |
the number of repetition for breaking the ties when |
For two continuous variables, robustest tests H0 X and Y are independent against H1 X and Y are not independent.
For observations (x1,y1), ..., (x_n,y_n), the bivariate e.c.d.f. (empirical cumulative distribution function) Fn is defined as:
Fn(t1,t2) = #{xi<=t1,yi<=t2}/n = sum_{i=1}^n Indicator(xi<=t1,yi<=t2)/n.
Let Fn(t1) and Fn(t2) be the marginals e.c.d.f. The test statistic is defined as:
n^(1/2) sup_{t1,t2} |Fn(t1,t2)-Fn(t1)*Fn(t2)|.
Under H0 the distribution of the test statistic is free and is equivalent to the same test statistic computed for two independent continuous uniform variables in [0,1], where the supremum is taken for t1,t2 in [0,1]. Using this result, the distribution of the test statistic is obtained using Monte-Carlo simulations. The user can either use the argument simu=TRUE to perform the Monte-Carlo simulation (with N the number of replications) or simply use the available tables by choosing simu=FALSE. In the latter case, the exact distribution is computed for n=1, ...,150. For 151<=n<=175, the distribution with n=150 is used. For 176<=n<=250, the distribution with n=200 is used. For 251<=n<=400, the distribution with n=300 is used. For 401<=n<=750, the distribution with n=500 is used. For n>=751, the distribution with n=1000 is used. Those tables were computed using 1e^5 replications.
Returns the result of the test with its corresponding p-value and the value of the test statistic.
Only a two sided alternative is possible with this test. Missing values are removed such that if a value
of x
(resp. y
) is missing then the corresponding
values of both x
and y
are removed. The test is then implemented on the remaining elements. If ties.break="none"
the ties are ignored, putting
mass (nb of ties)/n at tied observations in the computation of the empirical cumulative distribution functions.
If ties.break="random"
they are randomly broken.
This function is implemented using the Rcpp package.
See Distribution Free Tests of Independence Based on the Sample Distribution Function. J. R. Blum, J. Kiefer and M. Rosenblatt, 1961.
cortest
, vartest
, mediantest
, wilcoxtest
.
See also the hoeffd
function in the Hmisc
package for the Hoeffding test.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | #Simulated data 1
x<-c(0.2, 0.3, 0.1, 0.4)
y<-c(0.5, 0.4, 0.05, 0.2)
indeptest(x,y)
indeptest(x,y,ties.break="random")
#Simulated data 2
n<-40
x<-rnorm(n)
y<-x^2+0.3*rnorm(n)
plot(x,y)
indeptest(x,y)
#Application on the Evans dataset
#Description of this dataset is available in the lbreg package
data(Evans)
with(Evans,plot(CHL[CDH==1],DBP[CDH==1]))
with(Evans,cor.test(CHL[CDH==1],DBP[CDH==1])) #the standard Pearson test
with(Evans,cortest(CHL[CDH==1],DBP[CDH==1])) #the robust Pearson test
with(Evans,indeptest(CHL[CDH==1],DBP[CDH==1])) #the robust independence test
with(Evans,indeptest(CHL[CDH==1],DBP[CDH==1],ties.break="random")) #the robust independence test
#The robust tests give very different pvalues than the standard Pearson test!
#Breaking the ties
#The ties are broken once
with(Evans,indeptest(CHL[CDH==1],DBP[CDH==1],ties.break="random"))
#The ties are broken repetively and the average of the test statistics and p.values
#are taken
with(Evans,indeptest(CHL[CDH==1],DBP[CDH==1],ties.break="rep_random",nb_tiebreak=100))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.