simulate_lcm_tree: Simulate data and subject-specific indicators from...
In zhenkewu/lotR: Latent class analysis with Observed Trees in R (lotR)

simulate_lcm_tree

R Documentation

Simulate data and subject-specific indicators from tree-structured latent class models

Description

The observations belong to leaves that may further belong to a few groups, each with its own K-class probabilities. We assume all the leaves share the same set of K class-specific response probability profiles.

Usage

simulate_lcm_tree(
  n,
  itemprob,
  mytree,
  pi_mat,
  h_pau,
  balanced = TRUE,
  ratio = 4
)

Arguments

`n`	sample size
`itemprob`	item probabilities; this is shared across leaf nodes; K by J
`mytree`	see `design_tree()`
`pi_mat`	class probabilities for pL leaf nodes; it is pL by K.
`h_pau`	a p-dim vector of positive values that indicates the branch lengths between a node `u` and its parent `pa(u)`
`balanced`	by default is `TRUE` to uniformly assign observations to the leaf nodes; otherwise set this to `FALSE`.
`ratio`	for a pair of leaves; the sample ratios (larger vs smaller ones); in the event of an odd number of leaves; the smaller leaf in the pair is kept.

Value

a list

Y

observations leaf by leaf

curr_leaves

leaf names, need to be for each row of Y

truth

a list that contains the simulation truth:

Z true class indicators for all observations
itemprob a K by J matrix of response probability profiles
pi_mat the eta_v transformed to pi_v; pL by K
h_pau a vector of p values, each representing the branch length between the node u and its parent node pa(u)

Examples


library(igraph)
n = 1000
tau   <- c(0.6,0.3,0.1)
itemprob <- rbind(rep(rep(c(0.9, 0.9), each = 1),9),
                  rep(rep(c(0.5, 0.5), each = 1),9),
                  rep(rep(c(0.1, 0.1), each = 1),9))

data("lotR_example_edges")
mytree <- igraph::graph_from_edgelist(lotR_example_edges, directed = TRUE)
# Plot tree

nodes  <- names(igraph::V(mytree))
leaves <- names(igraph::V(mytree)[degree(mytree, mode = "out") == 0])
pL = length(leaves)
p  = length(igraph::V(mytree))

######
K = nrow(itemprob)
# specify the nodes that have non-trivial alpha_u, this was called
# xi_u, because xi_u = s_u*alpha_u and s_u = 1 if we set it in the simulation.
alpha_mat = rbind(logit(prob2stick(tau)[-K]),
                  c(-1,-0.5),
                  c(1,0.5),
                  matrix(0,nrow=p-3,ncol=K-1)
)

# get lists of ancestors for each leaf_ids:
d <- igraph::diameter(mytree,weights=NA)
# need to set weight=NA to prevent the use of edge lengths in determining the diameter.
ancestors <- igraph::ego(mytree, order = d + 1, nodes = leaves, mode = "in")
ancestors <- sapply(ancestors, names, simplify = FALSE)
ancestors <- sapply(ancestors, function(a, nodes) which(nodes %in% a),
                    nodes = nodes, simplify = FALSE)
names(ancestors) <- leaves

# calculate the class probabilities for all leaf nodes; each leaf node
# should have a K-dim vector that sums to one; Some nodes may share
# the same set of K-dim probability vector, others may differ. There are
# one or more groups of leaf nodes with distinct K-dim probability vectors.
# Note the branch lengths may also be used here.
pi_mat <- matrix(NA,nrow=pL,ncol=K)
for (v in seq_along(leaves)){
  pi_mat[v,-K] <- colSums(alpha_mat[ancestors[[v]],,drop=FALSE])
  pi_mat[v,] <- tsb(c(expit(pi_mat[v,-K]),1))
}

# s = c(1, 1,1,0,0, rep(0,pL)) # effective nodes
h_pau = rep(1,p)

lotR_example_data_tree <- simulate_lcm_tree(n,itemprob,mytree,pi_mat,h_pau)
#save the simulated data to the R package for illustration:
# save(lotR_example_data_tree, file = "data/lotR_example_data_tree2.rda", compress = "xz")

zhenkewu/lotR documentation built on April 24, 2022, 2:36 a.m.