g.tests_discrete: Graph-based two-sample tests for discrete data

Description Usage Arguments Value References Examples

Description

This function provides four graph-based two-sample tests for discrete data.

Usage

1
g.tests_discrete(E, counts, test.type = "all", maxtype.kappa = 1.14, perm = 0)

Arguments

E

An edge matrix representing a similarity graph on the distinct values with the number of edges in the similarity graph being the number of rows and 2 columns. Each row records the subject indices of the two ends of an edge in the similarity graph.

counts

A K by 2 matrix, where K is the number of distinct values. It specifies the counts in the K distinct values for the two samples.

test.type

The default value is "all", which means all four tests are performed: the orignial edge-count test (Chen and Zhang (2013)), extension of the generalized edge-count test (Chen and Friedman (2016)), extension of the weighted edge-count test (Chen, Chen and Su (2016)) and extension of the maxtype edge-count tests (Zhang and Chen (2017)). Set this value to "original" or "o" to permform only the original edge-count test; set this value to "generalized" or "g" to perform only extension of the generalized edge-count test; set this value to "weighted" or "w" to perform only extension of the weighted edge-count test; and set this value to "maxtype" or "m" to perform only extension of the maxtype edge-count tests.

maxtype.kappa

The value of parameter(kappa) in the extension of the maxtype edge-count tests. The default value is 1.14.

perm

The number of permutations performed to calculate the p-value of the test. The default value is 0, which means the permutation is not performed and only approximate p-value based on asymptotic theory is provided. Doing permutation could be time consuming, so be cautious if you want to set this value to be larger than 10,000.

Value

test.statistic_a

The test statistic using 'average' method to construct the graph.

test.statistic_u

The test statistic using 'union' method to construct the graph.

pval.approx_a

Using 'average' method to construct the graph, the approximated p-value based on asymptotic theory.

pval.approx_u

Using 'union' method to construct the graph, the approximated p-value based on asymptotic theory.

pval.perm_a

Using 'average' method to construct the graph, the permutation p-value when argument 'perm' is positive.

pval.perm_u

Using 'union' method to construct the graph, the permutation p-value when argument 'perm' is positive.

References

Friedman J. and Rafsky L. Multivariate generalizations of the WaldWolfowitz and Smirnov two-sample tests. The Annals of Statistics, 7(4):697-717, 1979.

Chen, H. and Zhang, N. R. Graph-based tests for two-sample comparisons of categorical data. Statistica Sinica, 2013.

Chen, H. and Friedman, J. H. A new graph-based two-sample test for multivariate and object data. Journal of the American Statistical Association, 2016.

Chen, H., Chen, X. and Su, Y. A weighted edge-count two sample test for multivariate and object data. Journal of the American Statistical Association, 2017.

Zhang, J. and Chen, H. Graph-based two-sample tests for discrete data.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# the "example_discrete" data contains three two-sample counts data 
# represted in the matrix form: counts1, counts2, counts3 
# and the corresponding distance matrix on the distinct values: ds1, ds2, ds3.
data(example_discrete) 

# counts1 is a K by 2 matrix, where K is the number of distinct values. 
# It specifies the counts in the K distinct values for the two samples. 
# ds1 is the corresponding distance matrix on the distinct values. 
# The data is generated from two samples with mean shift.
Knnl = 3
E1 = getGraph(counts1, ds1, Knnl, graph = "nnlink")
g.tests_discrete(E1, counts1)
 
# counts2 is a K by 2 matrix, where K is the number of distinct values. 
# It specifies the counts in the K distinct values for the two samples. 
# ds2 is the corresponding distance matrix on the distinct values. 
# The data is generated from two samples with spread difference.
Kmst = 6
E2 = getGraph(counts2, ds2, Kmst, graph = "mstree")
g.tests_discrete(E2, counts2)
 
# counts3 is a K by 2 matrix, where K is the number of distinct values. 
# It specifies the counts in the K distinct values for the two samples. 
# ds3 is the corresponding distance matrix on the distinct values. 
# The data is generated from two samples with mean shift and spread difference.
Knnl = 3
E3 = getGraph(counts3, ds3, Knnl, graph = "nnlink")
g.tests_discrete(E3, counts3)

## Uncomment the following line to get permutation p-value with 200 permutations.
# Knnl = 3
# E1 = getGraph(counts1, ds1, Knnl, graph = "nnlink")
# g.tests_discrete(E1, counts1, test.type = "all", maxtype.kappa = 1.31, perm = 300)

Example output

$original
$original$test.statistic_a
[1] -1.296305

$original$pval.approx_a
[1] 0.09743521

$original$test.statistic_u
[1] -1.043946

$original$pval.approx_u
[1] 0.1482552


$generalized
$generalized$test.statistic_a
[1] 5.794162

$generalized$pval.approx_a
[1] 0.05518408

$generalized$test.statistic_u
[1] 17.08936

$generalized$pval.approx_u
[1] 0.0001945777


$weighted
$weighted$test.statistic_a
[1] 2.401853

$weighted$pval.approx_a
[1] 0.008156133

$weighted$test.statistic_u
[1] 4.130076

$weighted$pval.approx_u
[1] 1.813213e-05


$maxtype
$maxtype$test.statistic_a
[1] 2.738112

$maxtype$pval.approx_a
[1] 0.01428503

$maxtype$test.statistic_u
[1] 4.708287

$maxtype$pval.approx_u
[1] 2.063016e-05


$original
$original$test.statistic_a
[1] -1.103914

$original$pval.approx_a
[1] 0.1348153

$original$test.statistic_u
[1] 2.157404

$original$pval.approx_u
[1] 0.9845129


$generalized
$generalized$test.statistic_a
[1] 2.683046

$generalized$pval.approx_a
[1] 0.2614471

$generalized$test.statistic_u
[1] 14.12116

$generalized$pval.approx_u
[1] 0.0008582815


$weighted
$weighted$test.statistic_a
[1] 0.3813054

$weighted$pval.approx_a
[1] 0.3514883

$weighted$test.statistic_u
[1] 0.5212234

$weighted$pval.approx_u
[1] 0.3011056


$maxtype
$maxtype$test.statistic_a
[1] 1.593001

$maxtype$pval.approx_a
[1] 0.1832904

$maxtype$test.statistic_u
[1] 3.721489

$maxtype$pval.approx_u
[1] 0.000746299


$original
$original$test.statistic_a
[1] -1.318211

$original$pval.approx_a
[1] 0.09371657

$original$test.statistic_u
[1] -0.6559303

$original$pval.approx_u
[1] 0.2559345


$generalized
$generalized$test.statistic_a
[1] 6.955576

$generalized$pval.approx_a
[1] 0.03087563

$generalized$test.statistic_u
[1] 10.0928

$generalized$pval.approx_u
[1] 0.006432458


$weighted
$weighted$test.statistic_a
[1] 2.588199

$weighted$pval.approx_a
[1] 0.004823963

$weighted$test.statistic_u
[1] 3.110183

$weighted$pval.approx_u
[1] 0.0009348578


$maxtype
$maxtype$test.statistic_a
[1] 2.950547

$maxtype$pval.approx_a
[1] 0.007980781

$maxtype$test.statistic_u
[1] 3.545608

$maxtype$pval.approx_u
[1] 0.001326199

gTests documentation built on May 2, 2019, 9:15 a.m.