Preprocessing: Data preprocessing

View source: R/Preprocessing.R

PreprocessingR Documentation

Data preprocessing

Description

This function is to prepare data for the ConNetGNN function.

Usage

Preprocessing(data, parallel.cores = 1, verbose = TRUE)

Arguments

data

The input data should be a data frame or a matrix where the rows are genes and the columns are cells. The seurat object are also accepted.

parallel.cores

Number of processors to use when doing the calculations in parallel (default: 2). If parallel.cores=0, then it will use all available core processors unless we set this argument with a smaller number.

verbose

Gives information about each step. Default: TRUE.

Details

Preprocessing

The function is able to interface with the seurat framework. The process of seurat data processing refers to Examples. The input data should be containing hypervariable genes and log-transformed. Left-truncated mixed Gaussian (LTMG) modeling to calculate gene regulatory signal matrix. Positively correlated gene-gene and cell-cell are used as the initial gene correlation matrix and cell correlation matrix.

Value

A list:

orig_dara

User-submitted raw data, rows are highly variable genes and columns are cells.

cell_features

Cell feature matrix.

gene_features

Gene feature matrix.

ltmg_matrix

Gene regulatory signal matrix for LTMG.

cell_adj

The adjacency matrix of the cell correlation network.

gene_adj

The adjacency matrix of the gene correlation network.

Examples


# Load dependent packages.
# require(coop)

# Seurat data processing.
# require(Seurat)

# Load the PBMC dataset (Case data for seurat)
# pbmc.data <- Read10X(data.dir = "../data/pbmc3k/filtered_gene_bc_matrices/hg19/")

# Our recommended data filtering is that only genes expressed as non-zero in more than
# 1% of cells, and cells expressed as non-zero in more than 1% of genes are kept.
# In addition, users can also filter mitochondrial genes according to their own needs.
# pbmc <- CreateSeuratObject(counts = pbmc.data, project = "case",
#                                     min.cells = 3, min.features = 200)
# pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
# pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

# Normalizing the data.
# pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize")

# Identification of highly variable features.
# pbmc <- FindVariableFeatures(pbmc, selection.method = 'vst', nfeatures = 2000)

# Run Preprocessing.
# Prep_data <- Preprocessing(pbmc)



# Users can also directly input data
# in data frame or matrix format
# containing highly variable genes.
data("Hv_exp")
Hv_exp <- Hv_exp[,1:20]
Hv_exp <- Hv_exp[which(rowSums(Hv_exp) > 0),]
Prep_data <- Preprocessing(Hv_exp[1:10,])

scapGNN documentation built on Aug. 8, 2023, 9:06 a.m.