cluster_analysis: Cluster Analysis

View source: R/cluster_analysis.R

cluster_analysisR Documentation

Cluster Analysis

Description

Analysis of the differentially expressed genes in the clusters and their composition by a marker based approach.

Usage

cluster_analysis(
  data,
  genes,
  cluster,
  c.names = NULL,
  dif.exp = TRUE,
  s.pval = 10^-2,
  markers = NULL,
  write = TRUE,
  verbose = TRUE
)

Arguments

data

a data frame of n rows (genes) and m columns (cells) of read or UMI counts (note : rownames(data)=genes)

genes

a character vector of HUGO official gene symbols of length n

cluster

a numeric vector of length m

c.names

a vector of cluster names

dif.exp

a logical (if TRUE, then computes the diferential gene expression between the clusters using **edgeR**)

s.pval

a value, a fixed p-value threshold

markers

a table of cell type signature genes

write

a logical

verbose

a logical

Details

If 'dif.exp' is TRUE, then the function uses **edgeR** functions **glmFit()** and **glmRT()** to find differentially expressed genes between one cluster and all the other columns of 'data'.

If 'dif.exp' is FALSE, then the function skips the differential gene analysis.

If the user does not set 'c.names', the clusters will be named from 1 to the maximum number of clusters (cluster 1, cluster 2, ...). The user can exploit the 'c.names' vector in the list returned by the **cell_classifier()** function for this purpose. The user can also provide her own cluster names.

's.pval' is the adjusted (Benjamini-Hochberg) p-value threshold imposed to gene differential expression.

If 'markers' is set, it must be a table with gene signatures for one cell type in each column. The column names are the names of the cell types.

If 'markers' is not provided, then the function skips the cluster cell type calling step.

If 'write' and 'dif.exp' are both TRUE, then the function writes a text file named "table_dge_X.txt", where X is the cluster name, that contains the list of differentially expressed genes.

If 'write' is TRUE and 'markers' is provided, then the function writes in a second text file a table containing probabilities of assignments of each cluster to a cell type for each cell cluster. This cell type calling is performed as for the individual cells without thresholding but based on the cluster average transcriptome.

Remark: this function can be used with any 'data' table associated with corresponding 'genes' and 'cluster' vectors, meaning that advanced users can perform their own data normalization and cell clustering upfront.

Value

The function returns a list comprised of a table of differentially expressed genes, a table of cell types, and a table of cell cluster types.

Examples

data=matrix(runif(1000,0,1),nrow=5,ncol=200)
rownames(data) <- c("A2M","LRP1","AANAT","MTNR1A","ACE")
cluster=c(rep(1,100),rep(2,100))
cluster_analysis(data,rownames(data),cluster,dif.exp=FALSE)

SCA-IRCM/SingleCellSignalR documentation built on Dec. 11, 2022, 2:30 p.m.