getPermStat: Get p-value or z-score based on permutation results

View source: R/permutation.R

getPermStatR Documentation

Get p-value or z-score based on permutation results

Description

This function starts with real COCOA scores for each region set and null distributions for each region set that come from running COCOA on permuted data. Then this function uses the null distributions to get an empirical p-value or z-score for each region set. See vignettes for the workflow that leads to this function. The calculation of the p-value/z-score does not include the real region set score in the null distribution.

Usage

getPermStat(
  rsScores,
  nullDistList,
  signalCol,
  testType = "greater",
  whichMetric = "pval"
)

Arguments

rsScores

data.frame. A data.frame with region set scores. The output of the 'aggregateSignalGRList' function. Each row is a region set. One column for each sample variable of interest (e.g. PC or sample phenotype). Also can have columns with info on the overlap between the region set and the epigenetic data. Rows should be in the same order as the region sets in GRList (the list of region sets used to create rsScores.)

nullDistList

List. one item per region set. Each item is a data.frame with the null distribution/s for a single region set. Each column in the data.frame is for a target variable (e.g. PC or phenotype), which is given by the 'signalCol' parameter (each target variable has a different null distribution for a given region set).

signalCol

A character vector with the names of the sample variables of interest/target variables (e.g. PCs or sample phenotypes). Must be column names of rsScores.

testType

Character. "greater", "lesser", "two-sided" Whether to create p values based on one sided test or not. Only applies when whichMetric="pval".

whichMetric

Character. Can be "pval" or "zscore"

Value

A data.table/data.frame. If whichMetric="pval", returns the empirical p-value for each region set in 'rsScores'. If the region set score is more extreme than all scores in the null distribution, a p-value of 0 is returned but this simply means the p-value is the minimum detectable p-value with the given number of permutations used to make the null distributions. If whichMetric="zscore", the function returns a z-score for each region set score: ((region set score) - mean(null distribution)) / sd(null distribution)

Examples

fakeOriginalScores <- data.frame(PC1=abs(rnorm(6)), PC2=abs(rnorm(6)))
fakePermScores <- data.frame(PC1=abs(rnorm(6)), PC2=abs(rnorm(6)))
fakePermScores2 <- data.frame(PC1=abs(rnorm(6)), PC2=abs(rnorm(6)))
fakePermScores3 <- data.frame(PC1=abs(rnorm(6)), PC2=abs(rnorm(6)))
permRSScores <- list(fakePermScores, fakePermScores2, fakePermScores3)
nullDistList <- convertToFromNullDist(permRSScores)
getPermStat(rsScores=fakeOriginalScores, nullDistList=nullDistList, 
            signalCol=c("PC1", "PC2"), whichMetric="pval") 
getPermStat(rsScores=fakeOriginalScores, nullDistList=nullDistList, 
            signalCol=c("PC1", "PC2"), whichMetric="zscore") 


databio/COCOA documentation built on Sept. 1, 2023, 5:50 p.m.