Introduction to 'rg.test

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(rgTest)

Working with rgTest

Get example data

set.seed(100)

d=200
vmu = rep(1.1/sqrt(d),d)
vsd = c(rep(1.1, d/5), rep(1, d-d/5))
num1 = 100
num2 = 100
s1 = matrix(0,num1,d)               # sample 1
s2 = matrix(0,num2,d)               # sample 2

for (i in 1:num1) {
  s1[i,] = rnorm(d)
}
for (i in 1:(num2)) {
  s2[i,] = rnorm(d, mean = vmu, sd = vsd)
}

num1 = nrow(s1)                     # number of observations in sample 1
num2 = nrow(s2)                     # number of observations in sample 2

Get an overview of the data.

The data of both samples have 200 variables. We take a look at the matrix of scatterplots of the first five variables for the two samples.

plot_dat = cbind(as.data.frame(rbind(s1[,1:5], s2[,1:5])), label = rep(c('sample 1', 'sample 2'), each = 100))
my_cols = c("#00AFBB", "#E7B800")  
pairs(plot_dat[, 1:5], col = my_cols[as.factor(plot_dat$label)])

Even though we know the observations from two samples are generated from different distribution, it is hard to tell the differnce by looking at the scatterplots.

Graph-based two-sample test

Use data matrices

res1 = rg.test(data.X = s1, data.Y = s2, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 1000, progress_bar = F)
type = c('robust generalized(asymptotic)', 'robust max-type(asymptotic)', 
         'robust generalized(permutation)', 'robust max-type(permutation)')
test.statistic = c(res1$asy.gen.statistic, res1$asy.max.statistic, NA, NA)
p.value = c(res1$asy.gen.pval, res1$asy.max.pval, res1$perm.gen.pval, res1$perm.max.pval)
res_tbl = as.data.frame(cbind(type, test.statistic, p.value))
knitr::kable(res_tbl, col.names = gsub("[.]", " ", names(res_tbl)))

Use the distance matrix

data = rbind(s1, s2)
dist = dist(as.matrix(data))
res2 = rg.test(dis = dist, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 1000)
type = c('robust generalized(asymptotic)', 'robust max-type(asymptotic)', 
         'robust generalized(permutation)', 'robust max-type(permutation)')
test.statistic = c(res2$asy.gen.statistic, res2$asy.max.statistic, NA, NA)
p.value = c(res2$asy.gen.pval, res2$asy.max.pval, res2$perm.gen.pval, res2$perm.max.pval)
res_tbl = as.data.frame(cbind(type, test.statistic, p.value))
knitr::kable(res_tbl, col.names = gsub("[.]", " ", names(res_tbl)))

Use the edge matrix

E = kmst(dis=dist, k=5)
res3 = rg.test(E = E, n1 = num1, n2 = num2, weigh.fun = weiMax, perm.num = 1000)
type = c('robust generalized(asymptotic)', 'robust max-type(asymptotic)', 
         'robust generalized(permutation)', 'robust max-type(permutation)')
test.statistic = c(res3$asy.gen.statistic, res3$asy.max.statistic, NA, NA)
p.value = c(res3$asy.gen.pval, res3$asy.max.pval, res3$perm.gen.pval, res3$perm.max.pval)
res_tbl = as.data.frame(cbind(type, test.statistic, p.value))
knitr::kable(res_tbl, col.names = gsub("[.]", " ", names(res_tbl)))

The two-sample test is done. We can see the asymptotic results are the same by using the data matrices, the distance matrix or the edge matrix generated by 5-MST. The p-values based on the permutation method are similar to those based on asymptotic method.



Try the rgTest package in your browser

Any scripts or data that you put into this service are public.

rgTest documentation built on Aug. 14, 2023, 5:08 p.m.