gltmtest: Tests the goodness-of-fit of a Gaussian latent tree model

View source: R/wrapper.R

gltmtestR Documentation

Tests the goodness-of-fit of a Gaussian latent tree model

Description

This function tests the goodness-of-fit of a given Gaussian latent tree model to observed data. It supports all Gaussian latent tree models where observed variables are restricted to be the leaves of the tree. Four different test strategies are implemented. One is the likelihood ratio test, the other three are algebraic tests. In the latter case the test statistic is formed as the maximum of unbiased estimates of the polynomials characterizing the parameter space of the model. A Gaussian multiplier bootstrap is used to estimate the limiting distribution of the test statistic and to compute the p-value of the test.

Usage

gltmtest(
  X,
  tree,
  m,
  test_strategy = "grouping",
  E = 1000,
  B = 5,
  N = 5000,
  only_equalities = FALSE,
  nr_4 = NULL,
  nr_3 = NULL
)

Arguments

X

Matrix with observed data. Number of columns has to be equal to the number of leaves of the tree (i.e. number of observed variables). Each row corresponds to one sample.

tree

An igraph object that is a tree. It is assumed that the first m nodes correspond to observed nodes.

m

Integer, number of observed nodes.

test_strategy

String, determines the test that is applied. Has to be one out of c("LR", "grouping", "run-over", "U-stat"). Default is "gouping".

E

Integer, number of bootstrap iterations. This has no effect if test_strategy="LR".

B

Integer, batch size for the estimate of the covariance matrix. Only relevant for the "run-over" test.

N

Integer, computational budget parameter. Only relevant for the "U-stat" test.

only_equalities

Logical. Should the test incorporate only equality constraints? Default FALSE. This has no effect if test_strategy="LR".

nr_4

Number of considered subsets of size 4. This is optional. Useful for very high-dimensional models. Default NULL hence all subsets are considered. If a number is given, subsets are chosen randomly and only the corresponding constraints are considered.

nr_3

Number of considered subsets of size 3. This is optional. Useful for very high-dimensional models. Default NULL hence all subsets are considered. If a number is given, subsets are chosen randomly and only the corresponding constraints are considered.

Value

Named list with two entries: Test statistic (TSTAT) and p-value (PVAL).

Examples

# Create tree
vertices <- data.frame(name=seq(1,8), type=c(rep(1,5), rep(2,3))) # 1=observed, 2=latent
edges <- data.frame(from=c(1,2,3,4,5,6,7), to=c(8,8,6,6,7,7,8))
tree <- igraph::graph_from_data_frame(edges, directed=FALSE, vertices=vertices)

# Sample data from tree
igraph::V(tree)$var = rep(1,8)
igraph::E(tree)$corr = rep(0.7,7)
X = sample_from_tree(tree, m=5, n=500)

# Goodness of fit test
gltmtest(X, tree, m=5, test_strategy="grouping")

NilsSturma/TestGGM documentation built on June 30, 2023, 3:09 p.m.