simDataset.withMASC: Simulate a dataset and save the results (metadata table and...

Description Usage Arguments Value

View source: R/simulateDatasets.R

Description

This function simulates a dataset and then performs MASC analysis on the generated data. The function saves the results as a list containing a metadata table for all simulated cells, as well as the MASC analysis results for all of the simulated cell states. The simulated PC locations for all cells can optionally be saved.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
simDataset.withMASC(
  save_path,
  rep = 1,
  seed = 1,
  ncases,
  nctrls,
  nbatches,
  batchStructure = NULL,
  ncells,
  centroids,
  pc_cov_list,
  batch_vars,
  b_scale = 1,
  sample_vars,
  s_scale = 1,
  cfcov,
  cf_scale = 1,
  meanFreqs,
  clus,
  fc = 1,
  cond_induce = "cases",
  res_use = 1.2,
  mc.cores = 1,
  clusterData = TRUE,
  returnPCs = FALSE,
  null_mod = "1 + (1|batch) + (1|sample)",
  full_mod = "condition + (1|batch) + (1|sample)",
  adj_method = "bonferroni",
  verbose = TRUE
)

Arguments

save_path

The name of the directory the results will be saved to.

rep

A numeric value representing the replicate number of the simulated dataset.

seed

A numeric value representing the seed that will be set before simulating the dataset.

ncases

The number of cases.

nctrls

The number of controls.

nbatches

The number of batches that samples will be distributed into.

batchStructure

The structure of the study design in which cases and controls are split into batches. These structures are output by the "distributeSample" functions (which can then be modified if specific structure is desired). If this parameter is kept as NULL, this function will automatically create a batchStructure with the "distributeSamples" function.

ncells

A vector containing the number of cells that will be simulated for each sample. The vector must be the same length as the number of total samples (ncases + nctrls).

centroids

The mean PC values for each cell state. These are obtained as output from the "estimatePCVar" function.

pc_cov_list

A list containing the residual variance-covariance matrices for each cell state. These are obtained as output from the "estimatePCVar" function.

batch_vars

A matrix containing the batch_associated variance for each cell state in each PC. These are obtained as output from the "estimatePCVar" function.

b_scale

The magnitude of batch-associated gene expression variation the simulated dataset will exhibit. Setting b_scale = 1 will result in realistic levels of batch-associated variation (as derived from parameter estimation of the input dataset). Increasing b_scale results in higher variation, while decreasing b_scale results in lower variation.

sample_vars

A matrix containing the sample_associated variance for each cell state in each PC. These are obtained as output from the "estimatePCVar" function.

s_scale

The magnitude of sample-associated gene expression variation the simulated dataset will exhibit. Setting s_scale = 1 will result in realistic levels of sample-associated variation (as derived from parameter estimation of the input dataset). Increasing s_scale results in higher variation, while decreasing s_scale results in lower variation.

cfcov

The cell state frequency variance-covariance across samples. This is obtained as output from the "estimateFreqVar" function

cf_scale

The magnitude of cell state frequency variation that cell states will exhibit across samples in the simulated dataset. Setting cf_scale = 1 will result in realistic levels of cell state frequency variation (as derived from parameter estimation of the input dataset). Increasing cf_scale results in higher variation, while decreasing cf_scale results in lower variation.

meanFreqs

A vector containing the mean frequencies of cell states (linear space) across samples from the original input dataset. This vector is obtained as output from the "estimateFreqVar" function.

clus

The name of the cluster in which a fold change will be induced.

fc

The magnitude of the fold change that will be induced in the chosen cluster. If no fold change is desired, set fc = 1.

cond_induce

The condition you wish to induce a fold change in. Setting cond_induce = "cases" will induce a fold change into cases, while setting cond_induce = "ctrls" will induce a fold change into controls.

res_use

The resolution that will be used for clustering (Louvain method) the simulated dataset.

mc.cores

The number of cores that will be used for simulating the dataset and clustering.

clusterData

Boolean determining whether the simulated dataset will be clustered.

returnPCs

Boolean determining whether the function will also return the simulated PC locations for reach cell.

null_mod

The right-hand side of the formula that will be used as the null model in MASC analysis

full_mod

The right-hand side of the formula that will be used as the full model in MASC analysis

adj_method

The p-value correction method that will be used via the "p.adjust" function

verbose

Print out time dataset was simulated at

Value

This function returns NULL and instead saves the results to the directory designated in "save_path". If returnPCs = FALSE, the saved results will be a list containing the metadata table for the simulated cells. The metadata table contains a dataframe with the following columns: "cellstate" which refers to the assigned cell state during simulation, "sample" which refers to the sample the cell was assigned to, "batch" which refers to the sample the cell was assigned to, and "condition" which refers to the condition the cell was assigned to (case or control). If clusterData = TRUE, the metadata table will also contain a column "new_clus", which refers to the new cluster assignments for the simulated cells. If returnPCs = TRUE, the saved results will contain the metadata table for the simulated cells and a matrix containing the simulated PC locations for each cell.


immunogenomics/scpost documentation built on July 28, 2021, 4:03 a.m.