simulate_nlcm_doubletree: Simulate data from double-tree-structured nested latent class...
In zhenkewu/doubletree: Nested Latent Class Models for Domain-Adaptive and Semisupervised Learning of Cause-of-Deaths using Verbal Autopsy

simulate_nlcm_doubletree

R Documentation

Simulate data from double-tree-structured nested latent class models

Description

The observations belong to leaves in tree2 that may further form a few groups, each with its own K-class probabilities. In particular,

The K by J response profile probabilities are at the leaves of tree1 - they may form groups of distinct numbers of leaves in tree1 (e.g., cause groups with distinct response probability profiles).
For each leaf in tree1, the K-vector class probability vectors may vary across the leaves in tree2; they may further form a small number of leaf groups in tree2 with distinct probability vector values.

Usage

simulate_nlcm_doubletree(
  n,
  mytrees,
  itemprob_list,
  lambda_mat_list,
  pi_mat,
  balanced = TRUE,
  ratio = 4
)

Arguments

`n`	total sample size
`mytrees`	a list of two trees; see `design_doubletree()`
`itemprob_list`	a list of length `pL1`; each element is a `K` by `J` matrix: `K`-classes and `J` items
`lambda_mat_list`	a list of length `pL1`; each element is a `K` by `pL2` matrix; contains K-dim class probabilities for `pL2` leaf nodes; so each column sums to 1.
`pi_mat`	a `pL1` by `pL2` matrix of probabilities, with each column corresponding to a leaf node in tree2. Each column sums to one, representing the fractions of subjects assigned to each of the `pL1` leaves in tree1; the vectors of fractions do not need to be identical across leaf nodes in tree2, e.g., the cause-specific mortality fractions may vary by domains.
`balanced`	by default is `TRUE` to uniformly assign observations to the leaf nodes; otherwise set this to `FALSE`.
`ratio`	only used if `balance=FALSE`; for a pair of leaves in tree2; the sample ratios (larger vs smaller ones); in the event of an odd number of leaves, the smaller leaf in the pair is kept. This ratio is applied once to get total sample sizes in each leaf of tree2.

Value

a list. In particular, the present function does not make the true_leaf_ids missing for a subset of subjects in tree1. So to mimic a situation with missing tree1 leaf labels, additional creation of missing tree1 label indicators are needed! In addition, this function simulates fully observed responses; so to mimic situations where missing responses would happen, additional missing response indicators need to be created.

Y

observations

truth

a list that contains the simulation truth:

true_leaf_ids all the leaf memberships to both trees
Z true class indicators for all observations

N_sim_mat

pL1 by pL2 count matrix

Examples

# simulate an example data set.
rm(list=ls())

library(igraph)
library(doubletree)
library(MASS)
library(poLCA)
library(BayesLCA)

set.seed(20210926)

# second tree - over domains:
data("example_domain_edges")
domain_tree <- graph_from_edgelist(example_domain_edges, directed = TRUE)

# set the levels l*_u for nodes in the cause tree; nodes
# in the same level will share a slab variance multiplier tau_l* (times the edge length
# emanating from the parent).
igraph::V(domain_tree)$levels <- c(1,rep(2,length(igraph::V(domain_tree))-1))
igraph::E(domain_tree)$weight <- rep(1,length(E(domain_tree)))

nodes2  <- names(igraph::V(domain_tree))
leaves2 <- names(igraph::V(domain_tree)[igraph::degree(domain_tree, mode = "out") == 0])
rootnode2 <- names(igraph::V(domain_tree)[igraph::degree(domain_tree, mode = "in") == 0])
pL2 <- length(leaves2)
p2  <- length(nodes2)

# # first tree - over causes:
# data("example_cause_edges")
# cause_tree <- graph_from_edgelist(example_cause_edges, directed = TRUE)
cause_tree <- domain_tree # for simplicity, make then the same for illustration.

igraph::V(cause_tree)$levels <- c(1,rep(2,length(igraph::V(cause_tree))-1))
igraph::E(cause_tree)$weight <- rep(1,length(E(cause_tree)))

nodes1  <- names(igraph::V(cause_tree))
leaves1 <- names(igraph::V(cause_tree)[igraph::degree(cause_tree, mode = "out") == 0])
rootnode1 <- names(igraph::V(cause_tree)[igraph::degree(cause_tree, mode = "in") == 0])
pL1 <- length(leaves1)
p1  <- length(nodes1)

mytrees <- list(tree1 = cause_tree, tree2 = domain_tree)


###############################################################################
## Begin simulating data using the two trees above.
###############################################################################
n     <- 1000
K     <- 2
J     <- 20

fracs_leaves1  <- c(1,rep(1,pL1-1))
fracs_leaves1  <- fracs_leaves1/sum(fracs_leaves1)
pi_mat         <- matrix(fracs_leaves1,nrow=pL1,ncol=pL2)

# create tree1 leaf level class-specific response probabilities:
itemprob0 <- rbind(rep(rep(c(0.95, 0.95), each = 1),10),
                   rep(rep(c(0.1, 0.1), each = 1),10))
# gamma_mat_list <- list(logit(itemprob0))
#
# for (u in 1:(p1-1)){ # random increments on random columns
#   increment_mat <- matrix(rnorm(J*K,0,1),nrow=K,ncol=J)
#   ind_j         <- sample(1:J,floor(J/2))
#   increment_mat[,ind_j] <- 0
#   gamma_mat_list <- append(gamma_mat_list,list(increment_mat))
# }

#########################################################################
## diffusion along tree1 for itemprob_list
#########################################################################

# get lists of ancestors for each leaf_ids:
d1 <- igraph::diameter(mytrees[[1]],weights=NA)
# need to set weight=NA to prevent the use of edge lengths in determining the diameter.
ancestors1 <- igraph::ego(mytrees[[1]], order = d1 + 1, nodes = leaves1, mode = "in")
ancestors1 <- sapply(ancestors1, names, simplify = FALSE)
ancestors1 <- sapply(ancestors1, function(a, nodes) which(nodes %in% a),
                     nodes = nodes1, simplify = FALSE)
names(ancestors1) <- leaves1

# the list of item response probability
itemprob_list <-list()
for (v1 in 1:pL1){
  curr_mat <- itemprob0
  ind_j <- sample(1:J,floor(J/2))
  for (j in ind_j){
    curr_mat[,j] <- itemprob0[rev(1:nrow(curr_mat)),j]
  }
  # itemprob_list[[v1]] <- expit(Reduce("+",gamma_mat_list[ancestors1[[v1]]]))
  itemprob_list[[v1]] <- curr_mat
}

par(mfrow=c(ceiling(sqrt(pL1+1)),ceiling(sqrt(pL1+1))),
    mar=c(2,2,2,2))
for (v1 in 1:pL1){
  image(1:J,1:K,t(itemprob_list[[v1]]),main=paste0("leaf_tree1: ",v1))
}

############################################################################
## diffusion along tree2 for alpha_mat_list
############################################################################
# specify the nodes that have non-trivial alpha^cu_k, this was called
# xi^cu_k, because xi^cu_k = s_cu*alpha^cu_k and s_cu = 1 if we set it in the simulation.

alpha0_mat <- myrdirich(pL1,v=rep(1,K)) # root
alpha_mat_list <- vector("list",pL1)
for (l in 1:pL1){
  # alpha0 <- c(0.6,0.3,0.1) # root
  alpha0 <- alpha0_mat[l,]
  # for each leaf node in tree1:
  alpha_mat = rbind(logit(prob2stick(alpha0)[-K]),
                    0,
                    0,
                    matrix(0,nrow=p2-3,ncol=K-1)
  )
  alpha_mat_list[[l]] <- alpha_mat
}

# get lists of ancestors for each leaf_ids:
d2 <- igraph::diameter(mytrees[[2]],weights=NA)
# need to set weight=NA to prevent the use of edge lengths in determining the diameter.
ancestors2 <- igraph::ego(mytrees[[2]], order = d2 + 1, nodes = leaves2, mode = "in")
ancestors2 <- sapply(ancestors2, names, simplify = FALSE)
ancestors2 <- sapply(ancestors2, function(a, nodes) which(nodes %in% a),
                     nodes = nodes2, simplify = FALSE)
names(ancestors2) <- leaves2

# calculate the class probabilities for all leaf nodes in tree2; each leaf node
# should have a K-dim vector that sums to one; Some nodes may share
# the same set of K-dim probability vector, others may differ. There are
# one or more groups of leaf nodes with distinct K-dim probability vectors.
# Note the branch lengths may also be used here.
lambda_mat_list <- list() # will be a list of length pL1, each being K by pL2.
for (v1 in 1:pL1){
  lambda_mat_list[[v1]] <- matrix(NA,nrow=K,ncol=pL2)
  for (v2 in 1:pL2){
    tmp <- colSums(alpha_mat_list[[v1]][ancestors2[[v2]],,drop=FALSE])
    lambda_mat_list[[v1]][,v2] <- tsb(c(expit(tmp),1))
  }
}

# simulate data
example_data_doubletree <- simulate_nlcm_doubletree(n,mytrees,itemprob_list,lambda_mat_list,pi_mat)

# print("counts in simulation:")
example_data_doubletree$N_sim_mat

## save the simulated data to the R package for illustration:
# save(example_data_doubletree, file = "data/example_data_doubletree.rda", compress = "xz")
#

data("example_data_doubletree")

zhenkewu/doubletree documentation built on Oct. 21, 2023, 7:04 a.m.

zhenkewu/doubletree index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

zhenkewu/doubletree
Nested Latent Class Models for Domain-Adaptive and Semisupervised Learning of Cause-of-Deaths using Verbal Autopsy

simulate_nlcm_doubletree: Simulate data from double-tree-structured nested latent class...
In zhenkewu/doubletree: Nested Latent Class Models for Domain-Adaptive and Semisupervised Learning of Cause-of-Deaths using Verbal Autopsy

Simulate data from double-tree-structured nested latent class models

Description

Usage

Arguments

Value

See Also

Examples

Related to simulate_nlcm_doubletree in zhenkewu/doubletree...

R Package Documentation

Browse R Packages

We want your feedback!

zhenkewu/doubletree Nested Latent Class Models for Domain-Adaptive and Semisupervised Learning of Cause-of-Deaths using Verbal Autopsy

simulate_nlcm_doubletree: Simulate data from double-tree-structured nested latent class... In zhenkewu/doubletree: Nested Latent Class Models for Domain-Adaptive and Semisupervised Learning of Cause-of-Deaths using Verbal Autopsy

Simulate data from double-tree-structured nested latent class models

Description

Usage

Arguments

Value

See Also

Examples

Related to simulate_nlcm_doubletree in zhenkewu/doubletree...

R Package Documentation

Browse R Packages

We want your feedback!

zhenkewu/doubletree
Nested Latent Class Models for Domain-Adaptive and Semisupervised Learning of Cause-of-Deaths using Verbal Autopsy

simulate_nlcm_doubletree: Simulate data from double-tree-structured nested latent class...
In zhenkewu/doubletree: Nested Latent Class Models for Domain-Adaptive and Semisupervised Learning of Cause-of-Deaths using Verbal Autopsy