In hudie-lab/SC19036: The R Package For StatComp

question 1

m=10000
n=100
sk <- function(x) {
#computes the sample skewness coeff. 
xbar <- mean(x)
m3 <- mean((x - xbar)^3)
m2 <- mean((x - xbar)^2)
return( m3 / m2^1.5 )
}

##1-confidence 
alpha=0.1
##beta parameters
shape=seq(1,20,1)
#critical value for the skewness test
cv <- qnorm(1-alpha, 0, sqrt(6*(n-2) / ((n+1)*(n+3))))

N=length(shape)
pwr <- numeric(N)
for (j in 1:N) { #for each alpha1 : 
alpha1 <- shape[j]
sktests <- numeric(m)
for (i in 1:m) { #for each replicate
x <- rbeta(n, shape1=alpha1,shape2=alpha1)
sktests[i] <- as.integer(sk(x) >= cv) }
pwr[j] <- mean(sktests)
}
#plot power vs shape
plot(shape, pwr, type = "b", xlab = bquote(shape), ylim = c(0,1)) 
abline(h = .1, lty = 3)
se <- sqrt(pwr * (1-pwr) / m) #add standard errors 
lines(shape, pwr+se, lty = 3)
lines(shape, pwr-se, lty = 3)

###vs t distribution 
#critical value for the skewness test
cv <- qnorm(1-alpha/2, 0, sqrt(6*(n-2) / ((n+1)*(n+3))))
dfv=seq(5,80,3)
N=length(dfv)
pwr <- numeric(N)
for (j in 1:N) { #for each alpha1 : 
v <- dfv[j]
sktests <- numeric(m)
for (i in 1:m) { #for each replicate
x1 <- rt(n, df=v)
sktests[i] <- as.integer(abs(sk(x1)) >= cv) }
pwr[j] <- mean(sktests)
}
#plot power vs shape
plot(dfv, pwr, type = "b", xlab = bquote(dfv), ylim = c(0,1)) 
abline(h = .1, lty = 3)
se <- sqrt(pwr * (1-pwr) / m) #add standard errors 
lines(dfv, pwr+se, lty = 3)
lines(dfv, pwr-se, lty = 3)

Although neither beta distribution nor t distribution is normal, they are both symmetric. So the skewness feo them equal to 0.

From the results, we can find that the power of skewness test of normality against beta distribution is so low that none of each is high above the $\alpha=0.1$. This makes sense because beta distribution is symmetric when $\alpha = \beta$.

And the skewness test of normality against $t(v)$ shows results with power higher than the thredshold. With $v \to \inf$, the power $\to \alpha=0.1$.

question 2

n=30
m <- 10000
alpha=0.05
p <- matrix(0,ncol=3,nrow=m) 
for (j in 1:m) {
#number of replicates #storage for p-values
x1 <- rchisq(n, df=1)
ttest <- t.test(x1, alternative = "greater", mu = 1) 
p[j,1] <- ttest$p.value
x2<- runif(n,min=0,max=2)
ttest <- t.test(x2, alternative = "greater", mu = 1) 
p[j,2] <- ttest$p.value
x3 <- rexp(n)
ttest <- t.test(x3, alternative = "greater", mu = 1) 
p[j,3] <- ttest$p.value
}
p1.hat <- mean(p[,1]< alpha)##chisq with df=1
se1.hat <- sqrt(p1.hat * (1 - p1.hat) / m) 
print(c(p1.hat, se1.hat))
p2.hat <- mean(p[,2]< alpha)##unifrom(0,2)
se2.hat <- sqrt(p2.hat * (1 - p2.hat) / m) 
print(c(p2.hat, se2.hat))
p3.hat <- mean(p[,3]< alpha)##exponential(1)
se3.hat <- sqrt(p3.hat * (1 - p3.hat) / m) 
print(c(p3.hat, se3.hat))

From the simulations, we can find that the results of uniform distribution are more approximate to the true values. The probable reason may lie on the symmetric property of uniform distribution so that the samples from the chi-sqare or exponantial distribution can not contribute to the test $T$ which has a $t$ distribution.

question 3

For this discussion, we can think of the test: $H_0: power_1=power_2$.

z-test

I think that z-test can be suitable for this case. For z-test requires the samples from normal distribution, with the number of samples large enough, it can be summarized by an asympotic normal distribution.

1-pnorm(abs((0.651-0.676))/(sqrt(0.651*(1-0.651)+0.676*(1-0.676))))##z-test

The results show that we can not refuse the null hypothesis at the confidence level.

two-sample t test or paired two test

The power can be seen as the probability of a bernoulli test. Then the hypothesis would be written as the comparison of the probabilities of two binomial distribution $B(n,p)$. Then we can use prop.test() for two-sample t test.

For paired two test, there is need to know the rejection or acceptance for each simulation data under this method.

prop.test(c(651,676),c(1000,1000))##prop.test

McNemar test

This test is appropriate for this case. But the known data would not be enough to test the difference between the two methods. We need the number of every test result under each mothod. Then we can get a contigency table like below:

x=sample(c(rep(0,349),rep(1,651)),1000,replace=FALSE)
y=sample(c(rep(0,324),rep(1,676)),1000,replace=FALSE)
xtabs(~x+y,data=cbind(x,y))

For the first column, the first number stands for the number of tests that fail to refuse the $H_0$ under the two methods while the second one stands for the number of tests that fail to refuse the $H_0$ under the second method but succeed under the first method. For the second column, the first number stands for the number of tests that succeed to refuse the $H_0$ under the second method but fail under the first method while the second one stands for the number of tests that succeed to refuse the $H_0$ under the two methods.

hudie-lab/SC19036 documentation built on Jan. 2, 2020, 8:45 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com