Synthetic data generator from real counts


This function generates synthetic count data based on empirical cumulative distribution (ecdf) of real count data


synthData_from_ecdf(comm, mar = 2, Sigma, n, seed = 10010, verbose = FALSE)



community; a matrix of real count data that we want to simulate/sythesize. Samples are in rows and OTUs are in columns.


MARGIN for apply function to calculate zero proportion for each row (mar = 1) or column (mar = 2).


covariance structure of size p by p. p should match with the number of OTUs in comm, in other words, the number of columns of comm.


number of samples


seed number for data generation (rmvnorm)


logical value. If it is TRUE, it will print out which iteration is going on and how long it took for calculation for each step. The defulat is FALSE.


synthData_from_ecdf returns a data matrix of size n by p.



# goal is to generate synthetic data with a prescribed graph structure.
# load real data "QMP" in SPRING package.
set.seed(12345) # set the seed number for make_graph part.
p1 = ncol(QMP) # the number of nodes.
e1 = 2*p1 # the number of edges is set as twice the number of nodes.
gtype = "cluster"
# available types in SpiecEasi: "band", "cluster", "scale_free", "erdos_renyi", "hub", "block".
graph_p1 <- SpiecEasi::make_graph(gtype, p1, e1) # adjacency matrix. 1: edge, 0: no edge.
Prec1  <- SpiecEasi::graph2prec(graph_p1) # precision matrix. inverse of covariance.
Cor1   <- cov2cor(SpiecEasi::prec2cov(Prec1)) # correlation matrix.

X1_count <- synthData_from_ecdf(QMP, Sigma = Cor1, n = 100)
# generate data of size n by p.
# p = ncol(Cor1) = ncol(QMP) should hold.
# need to specify sample size n.

