View source: R/rescaleBatches.R

rescaleBatches | R Documentation |

Scale counts so that the average count within each batch is the same for each gene.

```
rescaleBatches(
...,
batch = NULL,
restrict = NULL,
log.base = 2,
pseudo.count = 1,
subset.row = NULL,
correct.all = FALSE,
assay.type = "logcounts"
)
```

`...` |
One or more log-expression matrices where genes correspond to rows and cells correspond to columns.
Alternatively, one or more SingleCellExperiment objects can be supplied containing a log-expression matrix in the If multiple objects are supplied, each object is assumed to contain all and only cells from a single batch.
If a single object is supplied, it is assumed to contain cells from all batches, so Alternatively, one or more lists of matrices or SingleCellExperiments can be provided;
this is flattened as if the objects inside each list were passed directly to |

`batch` |
A vector or factor specifying the batch of origin for all cells when only a single object is supplied in |

`restrict` |
A list of length equal to the number of objects in |

`log.base` |
A numeric scalar specifying the base of the log-transformation. |

`pseudo.count` |
A numeric scalar specifying the pseudo-count used for the log-transformation. |

`subset.row` |
A vector specifying which features to use for correction. |

`correct.all` |
Logical scalar indicating whether corrected expression values should be computed for genes not in |

`assay.type` |
A string or integer scalar specifying the assay containing the log-expression values. Only used for SingleCellExperiment inputs. |

This function assumes that the log-expression values were computed by a log-transformation of normalized count data, plus a pseudo-count. It reverses the log-transformation and scales the underlying counts in each batch so that the average (normalized) count is equal across batches. The assumption here is that each batch contains the same population composition. Thus, any scaling difference between batches is technical and must be removed.

This function is approximately equivalent to centering in log-expression space, the simplest application of linear regression methods for batch correction. However, by scaling the raw counts, it avoids loss of sparsity that would otherwise result from centering. It also mitigates issues with artificial differences in variance due to log-transformation. This is done by always downscaling to the lowest average expression for each gene such that differences in variance are dampened by the addition of the pseudo-count.

Use of `rescaleBatches`

assumes that the uninteresting factors described in `design`

are orthogonal to the interesting factors of variation.
For example, each batch is assumed to have the same composition of cell types.
If this is not true, the correction will not only be incomplete but may introduce spurious differences.

The output values are always re-log-transformed with the same `log.base`

and `pseudo.count`

.
These can be used directly in place of the input values for downstream operations.

All genes are used with the default setting of `subset.row=NULL`

.
Users can set `subset.row`

to subset the inputs, though this is purely for convenience as each gene is processed independently of other genes.

See `?"batchelor-restrict"`

for a description of the `restrict`

argument.
Specifically, the function will compute the scaling differences using only the specified subset of cells, and then apply the re-scaling to all cells in each batch.

A SingleCellExperiment object containing the `corrected`

assay.
This contains corrected log-expression values for each gene (row) in each cell (column) in each batch.
A `batch`

field is present in the column data, specifying the batch of origin for each cell.

Cells in the output object are always ordered in the same manner as supplied in `...`

.
For a single input object, cells will be reported in the same order as they are arranged in that object.
In cases with multiple input objects, the cell identities are simply concatenated from successive objects,
i.e., all cells from the first object (in their provided order), then all cells from the second object, and so on.

Aaron Lun

`regressBatches`

, for a residual calculation based on a fitted linear model.

`applyMultiSCE`

, to apply this function over multiple `altExps`

.

```
means <- 2^rgamma(1000, 2, 1)
A1 <- matrix(rpois(10000, lambda=means), ncol=50) # Batch 1
A2 <- matrix(rpois(10000, lambda=means*runif(1000, 0, 2)), ncol=50) # Batch 2
B1 <- log2(A1 + 1)
B2 <- log2(A2 + 1)
out <- rescaleBatches(B1, B2)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.