PLS-Integrating: Integrating Multiple Large Datasets

scPLSR Documentation

Integrating Multiple Large Datasets

Description

The "scPLS" function can be used for data integration of multiple datasets, it is basically based on our new algorithm: reference principal components integration (RPCI). RPCI decomposes all the target datasets based on the reference. The output of this function can be used for low dimension visualization.

Usage

scPLS(
  objects,
  eigens = 10,
  add.Id = NULL,
  var.gene = NULL,
  npc = 100,
  adjust = TRUE,
  ncore = 1,
  seed = 123
)

Arguments

objects

The list of multiple RISC objects: listobject1, object2, object3, .... The first set is the reference to generate gene-eigenvectors.

eigens

The number of eigenvectors used for data integration.

add.Id

Add a vector of Id to label different datasets, a character vector.

var.gene

Define the variable genes manually. Here input a vector of gene names as variable genes

npc

The number of the PCs returns from "scMultiIntegrate" function, they are usually used for the subsequent analyses, like cell embedding and cell clustering.

adjust

Whether adjust the number of eigenvectors.

ncore

The number of multiple cores for data integration.

seed

The random seed to keep consistent result.

References

Liu et al., Nature Biotech. (2021)

Examples

obj1 = raw.mat[[3]]
obj2 = raw.mat[[4]]
obj0 = list(obj1, obj2)
var0 = intersect(obj1@vargene, obj2@vargene)
PLS0 = scPLS(obj0, var.gene = var0, npc = 20, add.Id = c("Set1", "Set2"), ncore = 1)

bioinfoDZ/RISC documentation built on March 30, 2024, 9:19 p.m.