ptmJRF_permutation: Joint Random Forest for the simultaneous estimation of...

Description Usage Arguments Value References Examples

Description

Algorithm for the simultaneous estimation of multiple related networks. Some of the functions utilized are a modified version of functions contained in the R package randomForest (A. Liaw and M. Wiener, 2002).

Usage

1
2
ptmJRF_permutation(X, ntree=NULL, mtry=NULL, genes.name,
                 ptm.name, seed, to.store=NULL)

Arguments

X

List object containing expression data for each class, X=list(x_1,x_2, ... ) where x_1 is a (F x n_j) matrix with rows corresponding to post translational modification sites and columns to samples, while x_j for j > 1 is a (p x n_j) matrix with rows corresponding to proteins and columns to samples. For x_2, x_3, ... Rows need to be the same corresponding to the same proteins, while samples can vary. Missing values are not allowed. Rows of object x_1 does not need to be ordered in a specific way.

ntree

Numeric value: number of trees.

mtry

Numeric value: number of predictors to be sampled at each node.

genes.name

Vector containing genes name. The order needs to match the rows of x_j.

ptm.name

List of post translational modification variables in protein domain. This list must be ordered as rows of X[[1]].

seed

Integer. Permutation seed.

to.store

Optional Integer. Total number of importance scores to be stored. When omitted, all importance scores will be stored. Note that to compute FDR we do not need all (p-p) x p / 2 importance scores where p is the total number of proteins/genes, a sufficiently large number would work. This number is usually chosen based on the number of nodes. Suggested value is p x 20.

Value

A matrix with I rows and C + 2 columns where I is the total number of gene-gene interactions and C is the number of classes. The first two columns contain gene names for each interaction while the remaining columns contain importance scores for different classes.

to.store

Optional Integer. Total number of importance scores to be stored. When omitted, all importance scores will be stored. Note that to compute FDR we do not need all (p-p) x p / 2 importance scores where p is the total number of proteins/genes, a sufficiently large number would work. This number is usually chosen based on the number of nodes. Suggested value is p x 20.

References

Petralia, F., Song, W.M., Tu, Z. and Wang, P. (2016). New method for joint network analysis reveals common and different coexpression patterns among genes and proteins in breast cancer. Journal of proteome research, 15(3), pp.743-754.

A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2, 18–22.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
 # --- Generate data sets
 nclasses=2               # number of data sets / classes
 n1<-n2<-20               # sample size for each data sets
 p<-5                   # number of variables (genes)
 genes.name<-paste("G",seq(1,p),sep="")   # genes name
 ptm.name<-c("G1","G2","G3","G3","G4","G5","G1")   # ptm name
 p.ptm<-length(ptm.name)
 
 data1<-matrix(rnorm(p.ptm*n2),p.ptm,n1)       # generate PTM data
 data2<-matrix(rnorm(p*n1),p,n1)       # generate global proteomics
 
 # --- Run JRF and obtain importance score of interactions
  out<-ptmJRF(X=list(data1,data2),genes.name=genes.name,
          ptm.name=ptm.name)

petraf01/iJRF documentation built on Dec. 22, 2021, 7:46 a.m.