gmrpp: Generalized multi-response permutation procedure (GMRPP) for...
In nsahr/gMRPP: Generalized Multi-Response Permutation Procedure

Description Usage Arguments Details Value References Examples

View source: R/gMRPP.R

This function impletements the generalized multiresponse permutation procedure (GMRPP) for categorical, quantitative, and censored event-time variables

1 2	gmrpp(data.mtx, grp.data, vset.data, nperms = c(10, 1000), rand.seed = NULL)

`data.mtx`	The genomic data matrix. Each row is a features. Each column is a subject.
`grp.data`	# A data frame with columns "ID" (matches columns of data.mtx) and "grp" (for group comparisons), "nvar" (to associate with numerical variable), or "stime" and "evnt" (to associate with survival data).
`vset.data`	Variable set data, row for assignment of variable (vID) to a variable set (vset).
`nperms`	The minimum and maximum number of permutations for adaptive permutation. Default is c(10,1000).
`rand.seed`	Random seed used for reproducible computing. Default is NULL (no seed set).

This function performs the GMRPP method described by Cao, Sahr, and Pounds (2019), which is a generalization of the MRPP method described by Nettleton, Recknor, and Reecy (2008). It characterizes the genomic data of a gene set as the distances between each pair of subjects computed on data for genes in the gene set. This distance matrix is used to measure the association of the gene set genomic data with a categorical variable by computing the sum of distances for subjects belonging to different groups minus the sum of distances for subjects belonging to the same group. A greater value of this statistic indicates a stronger association. For numeric variables, the average of this statisic is computed over all sets of groupings defined by all possible dichotomizations. For censored survival time variables, this statistic is computed over all dichotomizations defined by risk sets (unique uncesored event times), with some adjustments for censoring. The method detects certain complex forms of gene-set associations that are not easily identified with other methods. The statistical significance is determined by an adaptive permutation procedure (Pounds et al 2011) that stops evaluating permuted data sets once it becomes clear the result is not statistically significant or a maximum number of permutations have been performed.

A data.frame with the following columns:

`vset`	The name of variable set (gene-set).
`vIDs`	The list of variables in the variable set (gene-set).
`dist.stat`	The distance statistic for the variable set.
`p.vset`	The p-value.
`nperms`	The number of permutations performed.

Cao X, Sahr N, Pounds S (2019) Robust Detection of Complex Gene-Set Associations. Manuscript.

Nettleton D, Recknor J, Reecy JM (2008) Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis. Bioinformatics 24: 192-201. PMID 18054553.

Pounds S, Cao X, Cheng C, et al (2011) Integrated analysis of pharmacologic, clinical and SNP microarray data using projection onto the most interesting statistical evidence with adaptive permutation testing. International Journal of Data Mining and Bioinformatics, 5, 143-57. PMID 21516175.

data(AMLex)                      # AML TARGET project expression data for selected genes (website)
data(AMLclin)                    # AML TARGET project overall survival data  
data(vsets)                      # KEGG gene sets for AML and CML (website)
res<-gmrpp(AMLex,AMLclin,vsets)  # Evaluate association of gene sets with overall survival
res[,-2]                         # results excluding variable IDs for brevity