Home

/

GitHub

/

BMILAB/PASCCA

/

README.md

README.md
In BMILAB/PASCCA: Cluster analysis of replicated alternative polyadenylation data using shrinkage canonical correlation analysis

PASCCA R package

Cluster analysis of replicated alternative polyadenylation data using shrinkage canonical correlation analysis

About

The package PASCCA is an easy-to-use R package for analyses of APA related gene expression, including the characterization of poly(A) sites, quantification of association between genes with/without repeated measurements, clustering of APA-related genes to infer significant APA specific gene modules, and the evaluation of clustering performance with a variety of indexes. By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. For the convenience of users, we provide synthetic poly(A) site data sets here.

Installing PASCCA

R (>3.1). R 3.3.3 is recommended.

plyr, parallel

Install the R package using the following commands on the R console:

install.packages("devtools")
library(devtools)
install_github("BMILAB/PASCCA")
library(PASCCA)

Using PASCCA

In order to facilitate user understanding, we use the provided example dataset to illustrate the standard analysis work-flow of PASCCA. Please refer to the User Guide for full details.

Data preprocessing is an important step in the data mining process. First, we use the function "PAprocess" to do preliminary processing of APA-related gene expression data.

##Loading example data
data(polyA_example_data2)
##Data preprocessing
pre_data <- PAprocess(data2,log=TRUE)

Second, based on processed expression data, to calculate the distance between genes with multiple poly(A) sites by the function "PASCCA" with seven parameters (data, alpha, repli, tissues, tiss). data: the APA-related gene expression. alpha: the cut-off value of the significance level. repli: the numbers of replicates per biological condition such as different tissues, cell types and developmental stages. tissues: the total number of biological conditions. * tiss: the frequency of the first type of repetition.

##Getting information of the samples
sample_name <- colnames(pre_data)[3:ncol(pre_data)]
sample_name <- strsplit(sample_name,"\\d$")
sample_name <- paste("",lapply(sample_name,"[[",1),sep="");
table(sample_name)
##Getting the number of repetitions per sample
sample_replicates <- as.numeric(table(sample_name))
sample_replicates <- sample_replicates[order(sample_replicates,decreasing = TRUE)]

##Calculationg PASCCA distance matrix
gene_dist <- PASCCA(pre_data, alpha = 0.05,
                    repli=sample_replicates,
                    tissues=length(unique(sample_name)),
                    tiss=sum(sample_replicates==sample_replicates[1]))
#or
gene_dist <- PASCCA(pre_data, alpha = 0.05,repli=c(rep(3,14)), tissues=14, tiss=14)

Distances of all gene pairs obtained by the function "PASCCA", then the distance matrix is further used for clustering by the function "PASCCluster". We adopted the widely-used clustering method, hierarchical clustering, to cluster genes, which was implemented by the R function using "hclust" default parameters. PASCCluster returns a list, including an object of class hclust which describes the tree produced by the clustering process and a vector with group memberships by "cutree".

##Hierarchical clustering
gene_cluster <- PASCCluster(gene_dist,nc=5,plot = TRUE)

If you are using PASCCA, please cite: Ye W#, Long Y#, Ji G, Su Y, Ye P, Fu H and Wu X* (2019) Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis, BMC Genomics, 20, 1-15.

BMILAB/PASCCA documentation built on Nov. 20, 2020, 11:32 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BMILAB/PASCCA
Cluster analysis of replicated alternative polyadenylation data using shrinkage canonical correlation analysis

README.md
In BMILAB/PASCCA: Cluster analysis of replicated alternative polyadenylation data using shrinkage canonical correlation analysis

PASCCA R package

About

Installing PASCCA

Mandatory

Required R Packages

Installation

Using PASCCA

Section 1 Data preprocessing

Section 2 Distance matrix computation

Section 3 Clustering analysis

Citation

R Package Documentation

Browse R Packages

We want your feedback!

BMILAB/PASCCA Cluster analysis of replicated alternative polyadenylation data using shrinkage canonical correlation analysis

README.md In BMILAB/PASCCA: Cluster analysis of replicated alternative polyadenylation data using shrinkage canonical correlation analysis

PASCCA R package

About

Installing PASCCA

Mandatory

Required R Packages

Installation

Using PASCCA

Section 1 Data preprocessing

Section 2 Distance matrix computation

Section 3 Clustering analysis

Citation

R Package Documentation

Browse R Packages

We want your feedback!

BMILAB/PASCCA
Cluster analysis of replicated alternative polyadenylation data using shrinkage canonical correlation analysis

README.md
In BMILAB/PASCCA: Cluster analysis of replicated alternative polyadenylation data using shrinkage canonical correlation analysis