This is a complete package to analyze single cell in-situ RCA data analysis.
The raw data obtained from image analysis pipeline. The generic data structure for RCA in-situ is as follows in xlsx or csv formated file
library(MolDia) ex_data <- readRCA(file = system.file("extdata", "Hypocampus_left.csv", package="MolDia"), cellid = "CellId", centX = "centroid_x", centY = "centroid_y") mm<-cbind(head(ex_data@data)[,10:15],head(ex_data@location)) CellId<- rownames(mm) rownames(mm)<- NULL dd<- cbind(CellId,mm) dd
The raw RCA data can be read by function readRCA
. To get help on the function write help(readRCA)
. The data read by function readRCA
will be in class RCA_class
. For more help on this class, write help(RCA_class)
.
library(MolDia) ex_data <- readRCA(file = system.file("extdata", "Hypocampus_left.csv", package="MolDia"), cellid = "CellId", centX = "centroid_x", centY = "centroid_y") ex_data
ex_data
RCA data can be ploted by plot
function. To plot all cell on the tissue, one has to use what = "cell"
parameter. If interactive graphics is expected then live = TRUE
. For details help help(plot.RCA_class)
RCA_map(data =ex_data, what = "cell", main = "Left Hippocampus")
ex_data_summary <- readsSummary(data = ex_data, readlimit = 10, text.size = 6, intervel.dep = c(5,20,40,60))
The venn-pie plot is a visual representation of RCA data, based on genes of interest. As example, we want to know proportional dependency of SST-interneuronal genes enriched cells. For details about the function see help(RCA_pieplot)
data(marker_gene) sst <- marker_gene[[5]]
data(marker_gene) sst <- marker_gene[[5]] sst
res <- RCA_pieplot (data = ex_data, gene = sst)
ex_data_1 <- RCA_filter(data = ex_data)
In the above output data, cell with reads 2 to 31 will only remain in the resulting data set. But sometime, 1 reads per cell is biologically important. In this case we can define our own expected reads/cell with data_mean
parameter.
ex_data_2 <- RCA_filter(data = ex_data, data_mean = 7)
In the above figure, the red curve is the actual distribution, and the black curve is the theoritical distrubution with our supplied mean. And finally we got 4775 cell with minium 1 and maximum 27 reads per cell.
Based on different criteria or condition, on can filter data with the function RCA_barplot
For detail about the function write help(RCA_barplot)
. This function will have output in class RCA_class
.
## Load data RCA_data <- ex_data
The loaded data is from mouse brain left hippocampus. Genes present in that data is
marker_gene_1 <- RCA_data@gene marker_gene_1
According to our pre knowledge, we can classify those genes in broad categories as follows
## This data set has provided with the package as sample information. data(marker_gene)
marker_gene
Now we can re-classify marker genes in broad category "General", "Neuronal" and "Non-neuronal" for our further analysis requirement
## Re-calssify per-dfine marker gene group in "General", "Neoronal" and "Non-neuronal" marker_gene <- list(general = marker_gene$genr, neuron = c(marker_gene$genr_neuro, marker_gene$genr_neuro_pyra1, marker_gene$genr_neuro_pyra2, marker_gene$genr_neuro_inter1, marker_gene$genr_neuro_inter2, marker_gene$genr_neuro_inter3, marker_gene$genr_neuro_inter4, marker_gene$genr_neuro_inter5, marker_gene$genr_neuro_inter6), nonneuron = marker_gene$genr_nonneuro) marker_gene
Now RCA_barplot
function will be applied on the data based on required selection criteria.
Without any selection criteria
## Without any selection criteria neuron_group <- RCA_barplot(data = RCA_data,gene = marker_gene)
As an example , our cell selection criteria is
## Parameter explanation # gene.target = 2 - Select 2nd list from marker_gene, which is a list of neuronal gene # target.min.count.cell = 2 - For Neuronal genes, minimum 2 reads/cell # rest.min.count.cell = 2 - For rest of the genes, minimum 2 reads/cell # at.least.gene = 1 - Cell will have at least 1 neuronal genes # gene.show = 2 - Only consider Neuronal genes to subset data neuron_group <- RCA_barplot(data = RCA_data, gene = marker_gene, gene.target = 2, target.min.count.cell = 2, rest.min.count.cell = 2, at.least.gene = 1, gene.show = 2, main = " Neuron gene group")
Before further analysis of RCA data, the data needs pre-processing like data scaling, centering , normlization.
Normalization, scaling and centering will apply on RCA data by the function RCA_preprocess
. Data normalization is nessary for data analysis, specially for compute PCA to reduce dimentionality. Scaled and centered data is necessary to vizualize data. [Need more focus]
neuron_group <- RCA_preprocess(data = neuron_group, normalization.method = "LogNormalize", do.scale = TRUE, do.center = TRUE)
Setting do.center
to TRUE will center the expression for each gene by subtracting the average expression for that gene. Setting do.scale
to TRUE will scale the expression level for each gene by dividing the centered gene expression levels by their standard deviations if do.center
is TRUE and by their root mean square otherwise. [Need more focus and need to test on the reference] [@vandenBerg2006]
In the previous section we have select cells based on different selection criteria on gene of interest. The dataset is as follows
neuron_group
Now we can cluster cells by different pipeline of interest by using the function RCA_cluster
. Available pipeline is SEURAT, BACKSPIN, MONOCLE. For details see help(RCA_cluster)
. Output result of function RCA_cluster
is in class RCA_class
neuron_group_cluster <- RCA_cluster (data = neuron_group, method = "seurat", pc = 0.9, resolution = 0.2) neuron_group_cluster
neuron_group_cluster <- RCA_cluster (data = neuron_group, method = "seurat", pc = 0.9, resolution = 0.2)
neuron_group_cluster
Now we can plot cells and cluster information on actual tissue to observe those results. To plot cells the following command can be used
RCA_map(neuron_group_cluster, what = "cell" , main = "Plot cells on tissue")
RCA_map(neuron_group_cluster, what = "cluster" , main = "Plot cell clusters on tissue")
To apply tSNE, we can use following command
neuron_group_cluster <- RCA_tsne(data = neuron_group_cluster, do.label = TRUE, pc= 0.9, perplexity= 100)
neuron_group_cluster <- RCA_tsne(data = neuron_group_cluster, do.label = TRUE, pc= 0.9, perplexity= 100)
To plot tSNE with SEURAT cluster, the following command can be used
RCA_map(neuron_group_cluster, what = "tsne" , main = "tSNE plot with cluster by SEURAT")
Now to find the genes those are marker or differentially expressed in individual cluster, can be identified. RCA_marker
function can be used for this purpose.
neuron_group_cluster_marker <- RCA_marker(data=neuron_group_cluster, topgene = 5 , test.use="bimod", only.pos = TRUE, main = "Neuronal gene group")
neuron_group_cluster_marker <- RCA_marker(data=neuron_group_cluster, topgene = 5 , test.use="bimod", only.pos = TRUE, main = "Neuronal gene group")
neuron_group_cluster_marker
neuron_group_cluster_marker@cluster.marker
Now to vizualize cluster marker
neuron_group_cluster_marker <- RCA_marker(data=neuron_group_cluster, topgene = 5 , test.use="bimod", only.pos = TRUE, main = "Neuronal gene group")
In the previous we have identified the marker for the specific cluster. Now we can label correspondong cluster with marker gene according to their significance order.
RCA_map(neuron_group_cluster_marker, what = "cluster" , main = "Plot cell clusters on tissue", label.topgene = 3)
To plot only gene "Enpp2" to see their cluster distribution baed on expression level
RCA_map(neuron_group_cluster_marker, what = "cluster" , main = "Enpp2 expressed cell only", gene = c("Enpp2"), label.topgene = 3)
RCA_map(neuron_group_cluster_marker, what = "tsne" , main = "Plot cell clusters on tSNE", label.topgene = 3)
To plot only gene "Enpp2" to see their cluster distribution baed on expression level
RCA_map(neuron_group_cluster_marker, what = "tsne" , main = "Enpp2 expressed cell only", gene = c("Enpp2"), label.topgene = 3)
With the resolution
parameter one can control the sige of the cluster. Its diserable to make bigger cluster first and then re-cluster any selected cluster. In the previous clustred data, cluster 0 has 505 cells.
neuron_group_cluster
Now we can select cluster 0 to recluster it again.
neuron_group_cluster_0 <- RCA_cluster (data = neuron_group_cluster, method = "seurat",cluster_id = 0, pc = 0.9, resolution = 0.1)
neuron_group_cluster_0 <- RCA_cluster (data = neuron_group_cluster, method = "seurat",cluster_id = 0, pc = 0.9, resolution = 0.1)
neuron_group_cluster_0
Now to plot cells
RCA_map(neuron_group_cluster_0, what = "cell" , main = "Re-cluster cluster 0 and map cells")
To plot cluster
RCA_map(neuron_group_cluster_0, what = "cluster" , main = "Re-cluster cluster 0 and map cluster")
To define cluster marker
neuron_group_cluster_0_marker <- RCA_marker(data=neuron_group_cluster_0, topgene = 5, test.use="bimod", only.pos = TRUE, main = "Neuronal gene group: Cluster 0") neuron_group_cluster_0_marker
To label cell cluster with marker gene
RCA_map(neuron_group_cluster_0_marker, what = "cluster" , main = "Plot cell clusters on tissue", label.topgene = 2)
To label tSNE cluster with marker gene
RCA_map(neuron_group_cluster_0_marker, what = "tsne" , main = "Plot cell clusters on tSNE", label.topgene = 2)
## Read data: Left and right HC hc_left <- readRCA(file = system.file("extdata", "Hypocampus_left.csv", package="MolDia"), cellid = "CellId",centX = "centroid_x", centY = "centroid_y") hc_right <- readRCA(file = system.file("extdata", "Hypocampus_right.csv", package="MolDia"), cellid = "CellId",centX = "centroid_x", centY = "centroid_y")
## Read data: Left and right HC hc_left <- readRCA(file = system.file("extdata", "Hypocampus_left.csv", package="MolDia"), cellid = "CellId",centX = "centroid_x", centY = "centroid_y") hc_right <- readRCA(file = system.file("extdata", "Hypocampus_right.csv", package="MolDia"), cellid = "CellId",centX = "centroid_x", centY = "centroid_y")
## Arrange marker gene data(marker_gene) marker_gene <- marker_gene mark_gene <- list(genr = marker_gene$genr, neuron = c(marker_gene$genr_neuro, marker_gene$genr_neuro_pyra1, marker_gene$genr_neuro_pyra2, marker_gene$genr_neuro_inter1, marker_gene$genr_neuro_inter2, marker_gene$genr_neuro_inter3, marker_gene$genr_neuro_inter4, marker_gene$genr_neuro_inter5, marker_gene$genr_neuro_inter6), nonneuron = marker_gene$genr_nonneuro)
## Arrange marker gene data(marker_gene) marker_gene <- marker_gene mark_gene <- list(genr = marker_gene$genr, neuron = c(marker_gene$genr_neuro, marker_gene$genr_neuro_pyra1, marker_gene$genr_neuro_pyra2, marker_gene$genr_neuro_inter1, marker_gene$genr_neuro_inter2, marker_gene$genr_neuro_inter3, marker_gene$genr_neuro_inter4, marker_gene$genr_neuro_inter5, marker_gene$genr_neuro_inter6), nonneuron = marker_gene$genr_nonneuro)
## Select cell : Barplot of Neuronal marker gene and extract those cells only hc_left <- RCA_barplot(data = hc_left, gene = mark_gene, gene.target = 2, at.least.gene = 2, gene.show = 2) hc_right <- RCA_barplot(data = hc_right, gene = mark_gene, gene.target = 2, at.least.gene = 2, gene.show = 2)
## Select cell : Barplot of Neuronal marker gene and extract those cells only hc_left <- RCA_barplot(data = hc_left, gene = mark_gene, gene.target = 2, at.least.gene = 2, gene.show = 2) hc_right <- RCA_barplot(data = hc_right, gene = mark_gene, gene.target = 2, at.least.gene = 2, gene.show = 2)
## Data preprocessing hc_left <- RCA_preprocess(data = hc_left, normalization.method = "LogNormalize", do.scale = TRUE, do.center = TRUE) hc_right <- RCA_preprocess(data = hc_right, normalization.method = "LogNormalize", do.scale = TRUE, do.center = TRUE)
## Data preprocessing hc_left <- RCA_preprocess(data = hc_left, normalization.method = "LogNormalize", do.scale = TRUE, do.center = TRUE) hc_right <- RCA_preprocess(data = hc_right, normalization.method = "LogNormalize", do.scale = TRUE, do.center = TRUE)
## Cluster data based on SEURAT pipeline hc_left <- RCA_cluster(data = hc_left, pc = 0.9, resolution = 0.3, method = "seurat") hc_right <- RCA_cluster(data = hc_right, pc = 0.9, resolution = 0.3, method = "seurat")
## Cluster data based on SEURAT pipeline hc_left <- RCA_cluster(data = hc_left, pc = 0.9, resolution = 0.3, method = "seurat") hc_right <- RCA_cluster(data = hc_right, pc = 0.9, resolution = 0.4, method = "seurat")
## Dimention reduction by tSNE on clustered data hc_left <- RCA_tsne(data = hc_left, do.label = TRUE, pc= 0.9, perplexity= 100) hc_right <- RCA_tsne(data = hc_right, do.label = TRUE, pc= 0.9, perplexity= 100)
## Dimention reduction by tSNE on clustered data hc_left <- RCA_tsne(data = hc_left, do.label = TRUE, pc= 0.9, perplexity= 100) hc_right <- RCA_tsne(data = hc_right, do.label = TRUE, pc= 0.9, perplexity= 100)
## Get cluster marker hc_left <- RCA_marker(data = hc_left, topgene =5, test.use="bimod", main = "hc_left") hc_right <- RCA_marker(data = hc_right, topgene =5, test.use="bimod", main = "hc_right")
RCA_map (data = hc_left, what = "tsne", label.topgene =2, main = "hc_left") RCA_map (data = hc_right, what = "tsne", label.topgene =2, main = "hc_right")
## Cluster compare my_comapre_1 <- RCA_clustcompare(preditorData = hc_left, prediction = hc_right) my_comapre_2 <- RCA_clustcompare(preditorData = hc_right, prediction = hc_left)
citation(package = "MolDia")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.