Multiple-Integrating: Integrating Multiple Datasets

scMultiIntegrateR Documentation

Integrating Multiple Datasets

Description

The "scMultiIntegrate" function can be used for data integration of multiple datasets, it is basically based on our new approach RPCI (reference principal components integration), which decomposes all the target datasets based on the reference data. The output of this function is RISC object, including the integrated eigenvectors and aligned gene expression values.

Usage

scMultiIntegrate(
  objects,
  eigens = 10,
  add.Id = NULL,
  var.gene = NULL,
  align = "OLS",
  npc = 50,
  adjust = TRUE,
  ncore = 1,
  seed = 123
)

Arguments

objects

The list of multiple RISC objects: listobject1, object2, object3, .... The first set is the reference to generate gene-eigenvectors.

eigens

The number of eigenvectors used for data integration.

add.Id

Add a vector of Id to label different datasets, a character vector.

var.gene

Define the variable genes manually. Here input a vector of gene names as variable genes

align

The method for alignment of gene expression values: "Optimal" for alignment by experience, "Predict" for alignment by RPCI prediction, and "OLS" for alignment by the ordinary linear regression.

npc

The number of the PCs returns from "scMultiIntegrate" function, they are usually used for the subsequent analyses, like cell embedding and cell clustering.

adjust

Whether adjust the number of eigenvectors.

ncore

The number of multiple cores for data integration.

seed

The random seed to keep consistent result.

References

Liu et al., Nature Biotech. (2021)

Examples

obj1 = raw.mat[[3]]
obj2 = raw.mat[[4]]
obj0 = list(obj1, obj2)
var0 = intersect(obj1@vargene, obj2@vargene)
obj0 = scMultiIntegrate(obj0, eigens = 8, var.gene = var0, align = 'Predict', 
                        npc = 20, add.Id = c("Set1", "Set2"), ncore = 2)
obj0 = scUMAP(obj0, npc = 8, use = "PLS", dist = 0.001, neighbors = 15)
DimPlot(obj0, slot = "cell.umap", colFactor = "Set", size = 2)
DimPlot(obj0, slot = "cell.umap", colFactor = "Group", size = 2, label = TRUE)

bioinfoDZ/RISC documentation built on March 30, 2024, 9:19 p.m.