simDataset.withMASC: Simulate a dataset and save the results (metadata table and...
In immunogenomics/scpost: Simulates single-cell datasets from input data

This function simulates a dataset and then performs MASC analysis on the generated data. The function saves the results as a list containing a metadata table for all simulated cells, as well as the MASC analysis results for all of the simulated cell states. The simulated PC locations for all cells can optionally be saved.

simDataset.withMASC(
  save_path,
  rep = 1,
  seed = 1,
  ncases,
  nctrls,
  nbatches,
  batchStructure = NULL,
  ncells,
  centroids,
  pc_cov_list,
  batch_vars,
  b_scale = 1,
  sample_vars,
  s_scale = 1,
  cfcov,
  cf_scale = 1,
  meanFreqs,
  clus,
  fc = 1,
  cond_induce = "cases",
  res_use = 1.2,
  mc.cores = 1,
  clusterData = TRUE,
  returnPCs = FALSE,
  null_mod = "1 + (1|batch) + (1|sample)",
  full_mod = "condition + (1|batch) + (1|sample)",
  adj_method = "bonferroni",
  verbose = TRUE
)

`save_path`	The name of the directory the results will be saved to.
`rep`	A numeric value representing the replicate number of the simulated dataset.
`seed`	A numeric value representing the seed that will be set before simulating the dataset.
`ncases`	The number of cases.
`nctrls`	The number of controls.
`nbatches`	The number of batches that samples will be distributed into.
`batchStructure`	The structure of the study design in which cases and controls are split into batches. These structures are output by the "distributeSample" functions (which can then be modified if specific structure is desired). If this parameter is kept as NULL, this function will automatically create a batchStructure with the "distributeSamples" function.
`ncells`	A vector containing the number of cells that will be simulated for each sample. The vector must be the same length as the number of total samples (ncases + nctrls).
`centroids`	The mean PC values for each cell state. These are obtained as output from the "estimatePCVar" function.
`pc_cov_list`	A list containing the residual variance-covariance matrices for each cell state. These are obtained as output from the "estimatePCVar" function.
`batch_vars`	A matrix containing the batch_associated variance for each cell state in each PC. These are obtained as output from the "estimatePCVar" function.
`b_scale`	The magnitude of batch-associated gene expression variation the simulated dataset will exhibit. Setting b_scale = 1 will result in realistic levels of batch-associated variation (as derived from parameter estimation of the input dataset). Increasing b_scale results in higher variation, while decreasing b_scale results in lower variation.
`sample_vars`	A matrix containing the sample_associated variance for each cell state in each PC. These are obtained as output from the "estimatePCVar" function.
`s_scale`	The magnitude of sample-associated gene expression variation the simulated dataset will exhibit. Setting s_scale = 1 will result in realistic levels of sample-associated variation (as derived from parameter estimation of the input dataset). Increasing s_scale results in higher variation, while decreasing s_scale results in lower variation.
`cfcov`	The cell state frequency variance-covariance across samples. This is obtained as output from the "estimateFreqVar" function
`cf_scale`	The magnitude of cell state frequency variation that cell states will exhibit across samples in the simulated dataset. Setting cf_scale = 1 will result in realistic levels of cell state frequency variation (as derived from parameter estimation of the input dataset). Increasing cf_scale results in higher variation, while decreasing cf_scale results in lower variation.
`meanFreqs`	A vector containing the mean frequencies of cell states (linear space) across samples from the original input dataset. This vector is obtained as output from the "estimateFreqVar" function.
`clus`	The name of the cluster in which a fold change will be induced.
`fc`	The magnitude of the fold change that will be induced in the chosen cluster. If no fold change is desired, set fc = 1.
`cond_induce`	The condition you wish to induce a fold change in. Setting cond_induce = "cases" will induce a fold change into cases, while setting cond_induce = "ctrls" will induce a fold change into controls.
`res_use`	The resolution that will be used for clustering (Louvain method) the simulated dataset.
`mc.cores`	The number of cores that will be used for simulating the dataset and clustering.
`clusterData`	Boolean determining whether the simulated dataset will be clustered.
`returnPCs`	Boolean determining whether the function will also return the simulated PC locations for reach cell.
`null_mod`	The right-hand side of the formula that will be used as the null model in MASC analysis
`full_mod`	The right-hand side of the formula that will be used as the full model in MASC analysis
`adj_method`	The p-value correction method that will be used via the "p.adjust" function
`verbose`	Print out time dataset was simulated at

This function returns NULL and instead saves the results to the directory designated in "save_path". If returnPCs = FALSE, the saved results will be a list containing the metadata table for the simulated cells. The metadata table contains a dataframe with the following columns: "cellstate" which refers to the assigned cell state during simulation, "sample" which refers to the sample the cell was assigned to, "batch" which refers to the sample the cell was assigned to, and "condition" which refers to the condition the cell was assigned to (case or control). If clusterData = TRUE, the metadata table will also contain a column "new_clus", which refers to the new cluster assignments for the simulated cells. If returnPCs = TRUE, the saved results will contain the metadata table for the simulated cells and a matrix containing the simulated PC locations for each cell.

immunogenomics/scpost documentation built on July 28, 2021, 4:03 a.m.