scMerge: Merge single-cell RNA-seq data from different batches,...

Description Usage Arguments Value Author(s) Examples

View source: R/scMerge.R

Description

Merge single-cell RNA-seq data from different batches, experiments, protocols.

Usage

1
2
3
4
5
6
scMerge(sce_combine, ctl = NULL, kmeansK = NULL, exprs = "logcounts",
  hvg_exprs = "counts", marker = NULL, marker_list = NULL,
  ruvK = 20, replicate_prop = 0.5, cell_type = NULL,
  cell_type_match = FALSE, cell_type_inc = NULL, fast_svd = FALSE,
  rsvd_prop = 0.1, dist = "cor", WV = NULL, WV_marker = NULL,
  return_all_RUV = FALSE, assay_name = NULL)

Arguments

sce_combine

A SingleCellExperiment object contains the batch-combined matrix with batch info in colData.

ctl

A chatacter vector of negative control. It should have a non-empty intersection with the rows of sce_combine.

kmeansK

A vector indicates the kmeans's K for each batch. The length of kmeansK needs to be the same as the number of batch.

exprs

A string inciating the name of the assay requiring batch correction in sce_combine, default is logcounts.

hvg_exprs

A string inciating the assay that to be used for highly variable genes identification in sce_combine, default is counts.

marker

An optional vector of markers, to be used in calculation of mutual nearest cluster. If no markers input, highly variable genes will be used instead.

marker_list

An optional list of markers for each batch, which will be used in calculation of mutual nearest cluster.

ruvK

An optional integer/vector indicating the number of unwanted variation factors that are removed, default is 20.

replicate_prop

A number indicating the ratio of cells that are included in pseudo-replicates, ranges from 0 to 1.

cell_type

An optional vector indicating the cell type information for each cell in the batch-combined matrix. If it is NULL, pseudo-replicate procedure will be run to identify cell type.

cell_type_match

An optional logical input for whether to find mutual nearest cluster using cell type information.

cell_type_inc

An optional vector indicating the indices of the cells that will be used to supervise the pseudo-replicate procedure.

fast_svd

If TRUE, randomised singular value decomposition will be used for singular value decomposition calculation. We recommend using this option when the number of cells is large (e.g. > 1000).

rsvd_prop

If fast_svd = TRUE, then rsvd_prop will be used to used to reduce the computational cost of randomised singular value decomposition. We recommend setting this number to less than 0.25 to achieve a balance between numerical accuracy and computational costs.

dist

The distance metrics that are used in the calculation of the mutual nearest cluster, default is Pearson correlation.

WV

A optional vector indicating the wanted variation factor other than cell type info, such as cell stages.

WV_marker

An optional vector indicating the markers of the wanted variation.

return_all_RUV

If FALSE, then only returns a SingleCellExperiment object with original data and one normalised matrix. Otherwise, the SingleCellExperiment object will contain the original data and one normalised matrix for each ruvK value. In this latter case, assay_name must have the same length as ruvK.

assay_name

The assay name(s) for the adjusted expression matrix(matrices). If return_all_RUV = TRUE assay_name must have the same length as ruvK.

Value

Returns a SingleCellExperiment object with following:

metadata

containing the ruvK vector, ruvK_optimal based on F-score, and the replicate matrix

assays

the original matrices and also the normalised matrices

Author(s)

Yingxin Lin, Kevin Wang

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
suppressPackageStartupMessages({
library(SingleCellExperiment)
library(scater)
library(scMerge)
library(scMerge.data)
})
# Loading example data
data("sce_mESC", package = "scMerge.data")
# Previously computed stably expressed genes
data("segList_ensemblGeneID")
# Running an example data with minimal inputs
sce_mESC <- scMerge(
                      sce_combine = sce_mESC,
                      ctl = segList_ensemblGeneID$mouse$mouse_scSEG,
                      kmeansK = c(1,3,3,1,1),
                      assay_name = "scMerge")
scater::plotPCA(sce_mESC, colour_by = "cellTypes", shape = "batch",
                 run_args = list(exprs_values = "logcounts"))
scater::plotPCA(sce_mESC, colour_by = "cellTypes", shape = "batch",
                 run_args = list(exprs_values = "scMerge"))

SydneyBioX/scMerge documentation built on Oct. 9, 2018, 3:28 p.m.