README.md

AbHAC R package

AbHAC: Aberration Hub analysis of Cancer

AbHAC is an R package for implementation of a simplistic approach for analysis of cancer genomics datasets in context of protein interaction networks. In AbHAC, each protein in the protein interaction network is considered as an individual subnetwork and based on abundance of molecules with aberrations at genomic or transcriptomic levels among neighborhood of that protein as well as the whole interactome, a Fisher's exact test p-value is calculated. The Fisher's exact test p-values are then corrected for multiple testing by permutation of the protein interaction network. Details are available in the paper/thesis manuscript.

Usage: Required objects

snv: a matrix/dataframe where column names represent name of samples and rownames represent name of genes. Value of each cell can either be NA or a character (e,g. "Mutated")

rna: a numeric matrix/dataframe where column names are sample names similar and in same order with snv. However, these names must be accompanied with T at the end of their name. For example, if sample names in snv are: a | b | c ..., in rna they should be: aT | bT | cT ... . These must be followed with nontumour samples ending with N. It is possible for the nontumour samples to have the same name (aN | bN ...) or something different (a2313N | a321bchN).

Usage: Optional objects

clinical: A dataframe with first column having the same names as snv, and the second column providing information about samples. These can be Metastasis/Primary, HighGrade/LowGrade or any other sets of strings describing patients subtypes.

ppi.database: By default, the package uses a protein interaction network built using PSICQUIC by querying for uniprot accession IDs obtained through Uniprot.ws package. The databases used for generating this dataframe include:

DIP, InnateDB, IntAct, MatrixDB, MINT, I2D-IMEx, InnateDB-IMEx, MolCon and BindingDB

The AbHAC functions require the first 2 columns of this dataframe. The IDs must be uniprot accession.

id.conversion.set: A dataframe with the following column names:

| ENTREZ_GENE | UNIPROTKB | GENES | ENSEMBL | REFSEQ_PROTEIN | | ----------- |:---------:|:-------:|:--------------:| -----------------:| | 7533 | Q04917 | YWHAH |ENSG00000128245 | NP_003396 |

Usage: Examples

Installing the package and all of its dependencies:

install.packages(c("devtools", "foreach", "doMC", "iterators" ,"plyr"))
source("http://bioconductor.org/biocLite.R")
biocLite("EdgeR")
require(devtools)
install_github("AbHAC", username="mehrankr")
require("AbHAC")    

abhac.brief is implemented to be used when a particular set of genes are of interest and we want to investigate the proteins that might interact with a significant number of our set of genes. These set of genes might be mutated (snv), upregulated (de.up) or downregulated(de.down).

Running abhac.brief with vector of mutated/upregulated/downregulated genes:

#Loading matrix of mutated genes and matrix of mRNA expression
data(snv)
data(rna)

#Randomly selecting the first 10/1000 genes
snv = sample(rownames(snv), 10)
de.up = sample(rownames(rna)[1:1000], 500)
de.down = sample(rownames(rna)[1001:2000], 500)

#Loading the default protein interaction data
data(ppi.database) 

#Loading dataframe used for converting IDs
data(id.conversion.set)

#Loading _fac_ which is a vector of all proteins existing inside _ppi.database_
data(fac) #vector of all proteins in ppi.database

abhac.brief.result = abhac.brief(de.up,de.down,fac=fac,snv=snv,
    enrichment.categories=c("snv.de","de.up"),
    ppi.database=ppi.database[,1:2],
    id.conversion.set=id.conversion.set)

If instead of particular selections of differentially expressed genes, we have an RPKM matrix of RNAseq or normalized mRNA expression values, AbHAC can find differentially expressed genes using EdgeR/limma. This is through the set.abhac function which accepts snv and rna matrices as input. The other important feature of this function is that you can provide subtype / phenodata of patients in a two column object called clinical.

#Loading example and default objects from the package
data(snv)
data(rna)
data(ppi.database) #2column whole human protein interaction database
data(id.conversion.set)
data(fac) #vector of all proteins in ppi.database


set.abhac.result = set.abhac(snv=snv,rna=rna,fac=fac,
   expression.method="Microarray",rna.paired=FALSE,
   fdr.cutoff=0.05,correction.method="BH",enrichment.categories=c("snv.de","de.up"),
   ppi.database=ppi.database[,1:2],id.conversion.set=id.conversion.set)

Important parameters

fisher.fdr : This parameter which is defaulted to using the permutation method described in the paper, can be set to any of the parameters accepted by p.adjust. The permutation based methods include Permutation.FDR and Permutation.FWER. if selecting any of these methods, other parameters described below would be important.

fisher.fdr.cutoff : By default is set to 0.05.

num.permuted.ppi: Number of permuted protein interaction networks to generate for multiple testing correction.

method.permuted.ppi: There are three options: AsPaper, ByDegree or equal.

bins.permuted.ppi: Number of bins that proteins in the network are categorized into and then permuted within those bins. Read parameter specified by method.permuted.ppi to understand more.

Nomenclature:

In old irish, abhac means a dwarf star.

Maintainer: mehran dot karimzadeh at uhnresearch dot ca or mehran dot karimzadehreghbati at mail dot mcgil dot ca



mehrankr/AbHAC documentation built on May 22, 2019, 6:49 p.m.