# Introduction to 'rg.test In rgTest: Robust Graph-Based Two-Sample Test

```knitr::opts_chunk\$set(
collapse = TRUE,
comment = "#>"
)
```
```library(rgTest)
```

## Working with rgTest

### Get example data

```set.seed(100)

d=200
vmu = rep(1.1/sqrt(d),d)
vsd = c(rep(1.1, d/5), rep(1, d-d/5))
num1 = 100
num2 = 100
s1 = matrix(0,num1,d)               # sample 1
s2 = matrix(0,num2,d)               # sample 2

for (i in 1:num1) {
s1[i,] = rnorm(d)
}
for (i in 1:(num2)) {
s2[i,] = rnorm(d, mean = vmu, sd = vsd)
}

num1 = nrow(s1)                     # number of observations in sample 1
num2 = nrow(s2)                     # number of observations in sample 2
```

#### Get an overview of the data.

The data of both samples have 200 variables. We take a look at the matrix of scatterplots of the first five variables for the two samples.

```plot_dat = cbind(as.data.frame(rbind(s1[,1:5], s2[,1:5])), label = rep(c('sample 1', 'sample 2'), each = 100))
my_cols = c("#00AFBB", "#E7B800")
pairs(plot_dat[, 1:5], col = my_cols[as.factor(plot_dat\$label)])
```

Even though we know the observations from two samples are generated from different distribution, it is hard to tell the differnce by looking at the scatterplots.

### Graph-based two-sample test

#### Use data matrices

```res1 = rg.test(data.X = s1, data.Y = s2, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 1000, progress_bar = F)
```
```type = c('robust generalized(asymptotic)', 'robust max-type(asymptotic)',
'robust generalized(permutation)', 'robust max-type(permutation)')
test.statistic = c(res1\$asy.gen.statistic, res1\$asy.max.statistic, NA, NA)
p.value = c(res1\$asy.gen.pval, res1\$asy.max.pval, res1\$perm.gen.pval, res1\$perm.max.pval)
res_tbl = as.data.frame(cbind(type, test.statistic, p.value))
knitr::kable(res_tbl, col.names = gsub("[.]", " ", names(res_tbl)))
```

#### Use the distance matrix

```data = rbind(s1, s2)
dist = dist(as.matrix(data))
res2 = rg.test(dis = dist, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 1000)
```
```type = c('robust generalized(asymptotic)', 'robust max-type(asymptotic)',
'robust generalized(permutation)', 'robust max-type(permutation)')
test.statistic = c(res2\$asy.gen.statistic, res2\$asy.max.statistic, NA, NA)
p.value = c(res2\$asy.gen.pval, res2\$asy.max.pval, res2\$perm.gen.pval, res2\$perm.max.pval)
res_tbl = as.data.frame(cbind(type, test.statistic, p.value))
knitr::kable(res_tbl, col.names = gsub("[.]", " ", names(res_tbl)))
```

#### Use the edge matrix

```E = kmst(dis=dist, k=5)
res3 = rg.test(E = E, n1 = num1, n2 = num2, weigh.fun = weiMax, perm.num = 1000)
```
```type = c('robust generalized(asymptotic)', 'robust max-type(asymptotic)',
'robust generalized(permutation)', 'robust max-type(permutation)')
test.statistic = c(res3\$asy.gen.statistic, res3\$asy.max.statistic, NA, NA)
p.value = c(res3\$asy.gen.pval, res3\$asy.max.pval, res3\$perm.gen.pval, res3\$perm.max.pval)
res_tbl = as.data.frame(cbind(type, test.statistic, p.value))
knitr::kable(res_tbl, col.names = gsub("[.]", " ", names(res_tbl)))
```

The two-sample test is done. We can see the asymptotic results are the same by using the data matrices, the distance matrix or the edge matrix generated by 5-MST. The p-values based on the permutation method are similar to those based on asymptotic method.

## Try the rgTest package in your browser

Any scripts or data that you put into this service are public.

rgTest documentation built on Aug. 14, 2023, 5:08 p.m.