moulton.t.test: moulton.t.test
In tbgitoo/moultonTools: moultonTools

Description Usage Arguments Details Value Author(s) Examples

T-test with Moulton correction for clustered variables. Works like t.test, but corrects the standard deviation and in some cases the degrees of freedom applicable in test evaluation

1
2
3

moulton.t.test(x, y = NULL,cluster_x=1:length(x),cluster_y=NULL, alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95,
...)

`x`	a (non-empty) numeric vector of data values
`y`	an optional (non-empty) numeric vector of data values
`cluster_x`	Cluster identities for the values in x
`cluster_y`	Cluster identities for the values in y
`alternative`	a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.
`mu`	a number indicating the true value of the mean (or difference in means if you are performing a two sample test).
`paired`	a logical indicating whether you want a paired t-test
`var.equal`	a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.
`conf.level`	confidence level of the interval.
`...`	If NULL is provided explicitly for both `cluster_x` and `cluster_y`, the t.test function is called and the arguments `...` are passed on to t.test. It is also possible to provide an `method` argument, this will be used internally in moulton_factor and ICC evaluation.

There are a number of different cases for the t-test, the following corrections are applied:

(1) If y=NULL, the variable x is tested against the theoretical mean mu. In that case, the Moulton correction factor is obtained from the values of x and the clustering of x given by cluster_x. The degree of freedom is estimated as the number of clusters minus 1 (i.e. length(unique(cluster_x))-1.
(2) If both x and y are provided, the further approach depends on whether a paired or unpaired test is desired. This depends on the value of the paired argument. If a paired test is desired, x is replaced by x-y, and only the clustering described in cluster_x is used, the approach being otherwise identical to the case (1) given above.
(3) If both x and y are provided, and an unpaired test is desired paired=FALSE, then the clustering of x and y is assumed to be described separately by cluster_x and cluster_y. In fact, in this case, cluster_x and cluster_y must no overlap, because the clustering is supposed to take place within the x and y, not in a crossed manner between them. In the case, the Mouton correction is applied to the joint variance; the degree of freedom is corrected according to the number of clusters rather than the number of observations; the details depend on whether var.equal is provided TRUE or FALSE (see t.test for details about this)

An object of class "htest", see t.test. In addition,

Marina + Thomas Braschler

# Example 1: Strong clustering vs. no clustering


# x and y random variables

n=10


cluster_sd = 1

cluster_values = rnorm(6,cluster_sd)

cluster_x=c(rep(1,n),rep(2,n),rep(3,n))
cluster_y=c(rep(4,n),rep(5,n),rep(6,n))
x=cluster_values[cluster_x]+rnorm(3*n,sd=0.15)
y=cluster_values[cluster_y]+rnorm(3*n,sd=0.15)

y=c(cluster_values[4]+rnorm(n,sd=0.15),cluster_values[5]+rnorm(n,sd=0.15),cluster_values[6]+rnorm(n,sd=0.15))


# Simple t-testing gives quite often a highly significant result (not always though, just re-run to get an idea)

t.test(x,y)

# If we know that x and y come each from only four distinct clusters, then the p-values should actually be closer to what we observe at the level of the clusters:

# While really, we should have sampled on the level of the clusters
t.test(aggregate(x ~ cluster_x,FUN=mean)$x,aggregate(y ~ cluster_y,FUN=mean)$y)

# The Moulton correction roughly does it

moulton.t.test(x,y,cluster_x,cluster_y)

# If on the other hand, the groups are distributed over the two test variables, this is a balanced design and chances to get a highly significant result where there is nothing are not that high anymore

cluster_x=c(rep(1,n),rep(2,n),rep(3,n))
cluster_y=cluster_x
x=cluster_values[cluster_x]+rnorm(3*n,sd=0.15)
y=cluster_values[cluster_y]+rnorm(3*n,sd=0.15)



t.test(x,y)

moulton.t.test(x,y,cluster_x,cluster_y)