simPCs: Simulate principal component locations for simulated cells

Description Usage Arguments Value

View source: R/generatePCs.R

Description

This function generates the principal component (PC) location for simulated cell. This function allows for control over cell state frequency variation, as well as batch-associated and sample-associated gene expression variation via (cf_scale), (b_scale), and (s_scale) respectively.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
simPCs(
  ncases,
  nctrls,
  nbatches,
  batchStructure = NULL,
  ncells,
  centroids,
  pc_cov_list,
  batch_vars,
  b_scale = 1,
  sample_vars,
  s_scale = 1,
  cfcov,
  cf_scale = 1,
  meanFreqs,
  clus,
  fc = 1,
  cond_induce = "cases",
  res_use = 1.2,
  mc.cores = 1,
  clusterData = TRUE,
  returnPCs = FALSE
)

Arguments

ncases

The number of cases.

nctrls

The number of controls.

nbatches

The number of batches that samples will be distributed into.

batchStructure

The structure of the study design in which cases and controls are split into batches. These structures are output by the "distributeSample" functions (which can then be modified if specific structure is desired). If this parameter is kept as NULL, this function will automatically create a batchStructure with the "distributeSamples" function.

ncells

A vector containing the number of cells that will be simulated for each sample. The vector must be the same length as the number of total samples (ncases + nctrls).

centroids

The mean PC values for each cell state. These are obtained as output from the "estimatePCVar" function.

pc_cov_list

A list containing the residual variance-covariance matrices for each cell state. These are obtained as output from the "estimatePCVar" function.

batch_vars

A matrix containing the batch_associated variance for each cell state in each PC. These are obtained as output from the "estimatePCVar" function.

b_scale

The magnitude of batch-associated gene expression variation the simulated dataset will exhibit. Setting b_scale = 1 will result in realistic levels of batch-associated variation (as derived from parameter estimation of the input dataset). Increasing b_scale results in higher variation, while decreasing b_scale results in lower variation.

sample_vars

A matrix containing the sample_associated variance for each cell state in each PC. These are obtained as output from the "estimatePCVar" function.

s_scale

The magnitude of sample-associated gene expression variation the simulated dataset will exhibit. Setting s_scale = 1 will result in realistic levels of sample-associated variation (as derived from parameter estimation of the input dataset). Increasing s_scale results in higher variation, while decreasing s_scale results in lower variation.

cfcov

The cell state frequency variance-covariance across samples. This is obtained as output from the "estimateFreqVar" function

cf_scale

The magnitude of cell state frequency variation that cell states will exhibit across samples in the simulated dataset. Setting cf_scale = 1 will result in realistic levels of cell state frequency variation (as derived from parameter estimation of the input dataset). Increasing cf_scale results in higher variation, while decreasing cf_scale results in lower variation.

meanFreqs

A vector containing the mean frequencies of cell states (linear space) across samples from the original input dataset. This vector is obtained as output from the "estimateFreqVar" function.

clus

The name of the cluster in which a fold change will be induced.

fc

The magnitude of the fold change that will be induced in the chosen cluster. If no fold change is desired, set fc = 1.

cond_induce

The condition you wish to induce a fold change in. Setting cond_induce = "cases" will induce a fold change into cases, while setting cond_induce = "ctrls" will induce a fold change into controls.

res_use

The resolution that will be used for clustering (Louvain method) the simulated dataset.

mc.cores

The number of cores that will be used for simulating the dataset and clustering.

clusterData

Boolean determining whether the simulated dataset will be clustered.

returnPCs

Boolean determining whether the function will also return the simulated PC locations for reach cell.

Value

If returnPCs = FALSE, this function returns a list containing the metadata table for the simulated cells. The metadata table contains a dataframe with the following columns: "cellstate" which refers to the assigned cell state during simulation, "sample" which refers to the sample the cell was assigned to, and "condition" which refers to the condition the cell was assigned to (case or control). If clusterData = TRUE, the metadata table will also contain a column "new_clus", which refers to the new cluster assignments for the simulated cells. If returnPCs = TRUE, this function returns a list containing: the metadata table for the simulated cells and a matrix containing the simulated PC locations for each cell.


immunogenomics/scpost documentation built on July 28, 2021, 4:03 a.m.