synthData_from_ecdf: Synthetic data generator from real counts

View source: R/synthData.R

synthData_from_ecdfR Documentation

Synthetic data generator from real counts

Description

This function generates synthetic count data based on empirical cumulative distribution (ecdf) of real count data

Usage

synthData_from_ecdf(comm, mar = 2, Sigma, n, seed = 10010, verbose = FALSE)

Arguments

comm

community; a matrix of real count data that we want to simulate/sythesize. Samples are in rows and OTUs are in columns.

mar

MARGIN for apply function to calculate zero proportion for each row (mar = 1) or column (mar = 2).

Sigma

covariance structure of size p by p. p should match with the number of OTUs in comm, in other words, the number of columns of comm.

n

number of samples

seed

seed number for data generation (rmvnorm)

verbose

logical value. If it is TRUE, it will print out which iteration is going on and how long it took for calculation for each step. The defulat is FALSE.

Value

synthData_from_ecdf returns a data matrix of size n by p.

Examples



require(SpiecEasi)

# goal is to generate synthetic data with a prescribed graph structure.
# load real data "QMP" in SPRING package.
data(QMP)
set.seed(12345) # set the seed number for make_graph part.
p1 = ncol(QMP) # the number of nodes.
e1 = 2*p1 # the number of edges is set as twice the number of nodes.
gtype = "cluster"
# available types in SpiecEasi: "band", "cluster", "scale_free", "erdos_renyi", "hub", "block".
graph_p1 <- SpiecEasi::make_graph(gtype, p1, e1) # adjacency matrix. 1: edge, 0: no edge.
Prec1  <- SpiecEasi::graph2prec(graph_p1) # precision matrix. inverse of covariance.
Cor1   <- cov2cor(SpiecEasi::prec2cov(Prec1)) # correlation matrix.

X1_count <- synthData_from_ecdf(QMP, Sigma = Cor1, n = 100)
# generate data of size n by p.
# p = ncol(Cor1) = ncol(QMP) should hold.
# need to specify sample size n.





GraceYoon/SPRING documentation built on June 29, 2022, 4:14 p.m.