knitr::opts_knit$set(root.dir = "d:/my_analysis/BRIC_TEST/2.Yan/")
library(IRISFGM)
load("d:/my_analysis/BRIC_TEST/2.Yan/YanObjectBRIC_qubic1.Rdata")

Intorduction to IRIS-FGM

General introduction

IRIS-FGM integrates in-house and state-of-the-art computational tools and provides two analysis strategies, including bicluster-based co-expression gene analysis (Xie, et al., 2020) and LTMG (left-truncated mixture Gaussian model)-embedded scRNA-Seq analysis (Wan, et al., 2019).

Main function

The main idea of IRIS-FGM consists of two major strategies:

Requirements

Environment

We recommend user to install IRIS-FGM on large memory (32GB) based linux operation system if user aims at analyzing bicluster-based co-expression analysis; if user aims at analyzing data by quick mode, we recommend to install IRIS-FGM on small memeory (8GB) based Windows or linux operation system; IRIS-FGM does not support MAC. We will assum you have the following installed:

Pre-install packge

install.packages(c('BiocManager','devtools', 'AdaptGauss', "pheatmap", 'mixtools','MCL', 'anocva', 
                   'qgraph','Rtools','ggpubr',"ggraph"))
BiocManager::install(c('org.Mm.eg.db','multtest', 'org.Hs.eg.db','clusterProfiler','DEsingle',
                       'DrImpute', 'scater', 'scran'))
devtools::install_github(repo = 'satijalab/seurat')

Input

  1. The input to IRIS-FGM is the single-cell RNA-seq expression matrix:

  2. Rows correspond to genes and columns correspond to cells.

  3. Expression units: the preferred expression values are RPKM/FPKM/CPM.
  4. The data file should be tab delimited.

  5. IRIS-FGM also accepts output files from 10X CellRanger, includinhg a folder which contains three individual files and h5 file.

Others

When you perform co-expression analysis, it will output several intermediate files, thus please make sure that you have write permission to the folder where IRIS-FGM is located.

Installation

For installation, simply type the following command in your R console, please select option 3 when R asks user to update packages:

devtools::install_github("BMEngineeR/IRISCEM", force = T)

Example dataset

This tutorial run on a real dataset to illustrate the results obtained at each step.

As example, we will use Yan's data, a dataset containing 90 cells and 20,214 genes from human embryo, to conduct cell type prediction.

Yan, L. et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat. Struct. Mol. Biol. 20, 1131-1139 (2013)

The original expression matrix was downloaded from https://s3.amazonaws.com/scrnaseq-public-datasets/manual-data/yan/nsmb.2660-S2.csv. The expression is provided as RPKM value. For convenience, we removed the space in the column names and deleted the second column(Transcript_ID). The processed data is available at https://bmbl.bmi.osumc.edu/downloadFiles/Yan_expression.txt.

1. Input data, create IRISCEM object, add meta information, and preprocessing.

IRIS-FGM can accepted 10X chromium input files, including a folder (contain gene name, cell name, and sparse matrix) and .h5 file.

Input data

  1. set working directory and import library
setwd("~/2.Yan/")
library(IRISFGM)
  1. Read from .h5 file.
InputMatrix <- ReadFrom10X_h5("~/5k_pbmc_protein_v3_filtered_feature_bc_matrix.h5")
  1. Read from 10X folder.
InputMatrix <- ReadFrom10X_folder("~/10X_3K/folder_10X/")
  1. Read from .csv or .txt file

we will use this data set as example to run the pipeline.

InputMatrix <- read.table("~/2.Yan/Yan_expression.txt",header = T, row.names = 1)

Add meta information

  1. Create IRIS-FGM object.
object <- CreateIRISFGMObject(InputMatrix)
  1. Addmeta: this step can add customized cell label by user, the format of file passing to meta.info is data frame of which row name should be cell ID, and column name should be cell type.
object <- AddMeta(object, meta.info = NULL)
  1. plotmeta: plot meta information based on RNA count and Feature number. This step is for the following subset step in terms of filtering out low quality data.
PlotMeta(object)
  1. remove low quality data based on the previous plot.
object <- SubsetData(object , nFeature.upper=15000,nFeature.lower=8000,
                         Counts.upper=700000,Counts.lower=400000)

Preprocesing

User can choose perform normalization or imputation based on their need. The normalization method has two options, one is the simplist CPM normalization (default normalization = 'LibrarySizeNormalization'). The other is from package scran and can be opened by using parameter normalization = 'scran', . The imputation method is from package DrImpute and can be opened by using parameter IsImputation = TRUE (default as closed).

object <- ProcessData(object, normalization = "cpm", IsImputation = FALSE, seed = 123)

2. Run LTMG

The argument Gene_use = 500 is top 500 highlt variant genes which are selected to run LTMG. For quick mode, we recommend to use top 2000 gene (here we use top 500 gene for saving time). On the contrary, for co-expression gene analysis, we recommend to use all gene by changing Gene_use = "all".

# demo only run top 500 gene for saving time.
object <- RunLTMG(object, Gene_use = 500, seed = 123)

3. Seurat implemented analysis.

Dimension Reduction

User can use reduction = "umap" or reductopm = "tsne" to perform dimension reduction.

# demo only run top 500 gene for saving time.
object <- RunDimensionReduction(object, reduction = "umap")

Cluster

# demo only run top 500 gene for saving time.
object <- RunClassification(object,  k.param = 20, resolution = 0.5, algorithm = 1)

Plot dimension reduction plot

# demo only run top 500 gene for saving time.
PlotDimension(object,reduction = "umap")

This function need user to input group that is used to plot on the figure. Input 4 means choose the "Seurat0.5" group as cell label to plot. dim cluster

4. Biclustering based co-expression analysis

IRIS-FGM can provide biclustering function, which is based on our in-house novel algorithm, QUBIC2 (https://github.com/maqin2001/qubic2). Here we will show the basic biclustering usage of IRIS-FGM using a $200\times 88$ expression matrix generated from previous top 500 variant genes. However, we recommend user should use Gene_use = all to generate LTMG matrix.

LTMG-discretized bicluster (recommend for small single cell RNA-seq data)

User can type the following command to run discretization (LTMG) + biclustering directly:

object <- RunLTMG(object, Gene_use = "all", seed = 123)
object <- CalBinaryMultiSignal(object)
object <- RunBicluster(object, DiscretizationModel = "LTMG",OpenDual = TRUE,
                          NumBlockOutput = 100, BlockOverlap = 0.7, BlockCellMin = 15)

Quantile-discretized bicluster (recommend for bulk RNA-Seq, microarray data, or large single cell RNA-Seq data)

This will output several files, and among them you will find one named Yan_sub.txt.chars.blocks,which contains the predicted biclusters. Or, user may use first version discretization strategy provided by QUBIC 1.0.

object <- RunDiscretization(object)
object <- RunBicluster(object, DiscretizationModel = "Quantile",OpenDual = TRUE, Extension = 0.90,
                          NumBlockOutput = 1000, BlockOverlap = 0.7, BlockCellMin = 15)

(The default parameters in IRIS-FGM are BlockCellMin=15, BlockOverlap=0.7, Extension=0.90, NumBlockOutput=100 you may use other parameters as you like, just specify them in the argument)

Cell type prediction based on Markove clustering

The cell type prediction of IRIS-FGM is based on the biclustering results. In short, it will construct a weighted graph based on the biclusters and then do clustering on the weighted graph. Currently, we provide two commonly used clustering method: MCL .

object <- FindClassBasedOnMC(object)

Visualize block and network.

PlotHeatmap(object ,N.bicluster =c(1,5),show.annotation = T)
PlotModuleNetwork(object, N.bicluster = 1, Node.color = "#E8E504")

5. Biological interpretation.

Cell-type-specific marker genes

object <- FindMarkers(object)

User need to select cell type to compare, while here we select 4:Suerat0.5 as cell type category to analyze. markergene

Then IRIS-FGM will ask user choose a first group as reference, while here we select the third group (3 : 2) marked as cluster 2 in umap. markergene

Then user requires to select the second group as compared object, while here user can choose either one group (2 : 1, 3 : 2), or rest of all groups (4 : rest of all). markergene

After running the Findmarker, user can find table in object@LTMG@MarkerGene if using quick mode or find table in object@BiCluster@MarkerGene.

Cell-type-specific pathways.

The first pathway analysis is based on quick mode by specifying genes.source = "CTS", which means cell-type-specific marker genes; the second pathway analysis is based on genes from bicluster block.

object <- RunPathway(object, selected.gene.cutoff = 0.05,
                        species = "Human", database = "GO", genes.source = "CTS")
object <- RunPathway(object ,module.number = 5, selected.gene.cutoff = 0.05,
                        species = "Human", database = "GO", genes.source = "Bicluster")

sessioninfo

sessionInfo()


carter-allen/IRISFGM documentation built on Dec. 31, 2020, 9:54 p.m.