calculatePvalues: Calculate p-values and identify regions
In lcolladotor/derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

calculatePvalues

R Documentation

Calculate p-values and identify regions

Description

First, this function finds the regions of interest according to specified cutoffs. Then it permutes the samples and re-calculates the F-statistics. The area of the statistics from these segments are then used to calculate p-values for the original regions.

Usage

calculatePvalues(
  coveragePrep,
  models,
  fstats,
  nPermute = 1L,
  seeds = as.integer(gsub("-", "", Sys.Date())) + seq_len(nPermute),
  chr,
  cutoff = quantile(fstats, 0.99, na.rm = TRUE),
  significantCut = c(0.05, 0.1),
  lowMemDir = NULL,
  smooth = FALSE,
  weights = NULL,
  smoothFunction = bumphunter::locfitByCluster,
  ...
)

Arguments

`coveragePrep`	A list with `⁠$coverageProcessed⁠`, `⁠$mclapplyIndex⁠`, and `⁠$position⁠` normally generated using preprocessCoverage.
`models`	A list with `⁠$mod⁠` and `⁠$mod0⁠` normally generated using makeModels.
`fstats`	A numerical Rle with the F-statistics normally generated using calculateStats.
`nPermute`	The number of permutations. Note that for a full chromosome, a small amount (10) of permutations is sufficient. If set to 0, no permutations are performed and thus no null regions are used, however, the `⁠$regions⁠` component is created.
`seeds`	An integer vector of length `nPermute` specifying the seeds to be used for each permutation. If `NULL` no seeds are used.
`chr`	A single element character vector specifying the chromosome name. This argument is passed to findRegions.
`cutoff`	F-statistic cutoff to use to determine segments.
`significantCut`	A vector of length two specifiying the cutoffs used to determine significance. The first element is used to determine significance for the P-values, while the second element is used for the Q-values (FDR adjusted P-values).
`lowMemDir`	The directory where the processed chunks are saved when using preprocessCoverage with a specified `lowMemDir`.
`smooth`	Whether to smooth the F-statistics (`fstats`) or not. This is by default `FALSE`. For RNA-seq data we recommend using `FALSE`.
`weights`	Weights used by the smoother as described in smoother.
`smoothFunction`	A function to be used for smoothing the F-statistics. Two functions are provided by the `bumphunter` package: loessByCluster and runmedByCluster. If you are using your own custom function, it has to return a named list with an element called `⁠$fitted⁠` that contains the smoothed F-statistics and an element claled `⁠$smoothed⁠` that is a logical vector indicating whether the F-statistics were smoothed or not. If they are not smoothed, the original values will be used.
`...`	Arguments passed to other methods and/or advanced arguments. Advanced arguments: verbose If `TRUE` basic status updates will be printed along the way. scalefac This argument is passed to fstats.apply and should be the same as the one used in preprocessCoverage. Default: 32. method Has to be either 'Matrix' (default), 'Rle' or 'regular'. See details in fstats.apply. adjustF A single value to adjust that is added in the denominator of the F-stat calculation. Useful when the Residual Sum of Squares of the alternative model is very small. Default: 0. writeOutput If `TRUE` then the regions are saved before calculating q-values, and then overwritten once the q-values are written. This argument was introduced to save the results from the permutations (can take some time) to investigate the problem described at https://support.bioconductor.org/p/62026/ maxRegionGap Passed to internal functions of findRegions. Default: 0. Passed to findRegions, `smoothFunction` and define_cluster.

Value

A list with four components:

regions: is a GRanges with metadata columns given by findRegions with the additional metadata column pvalues: p-value of the region calculated via permutations of the samples; qvalues: the qvalues calculated using qvalue; significant: whether the p-value is less than 0.05 (by default); significantQval: whether the q-value is less than 0.10 (by default). It also includes the mean coverage of the region (mean from the mean coverage at each base calculated in preprocessCoverage). Furthermore, if groupInfo was not NULL in preprocessCoverage, then the group mean coverage is calculated as well as the log 2 fold change (using group 1 as the reference).
nullStats: is a numeric Rle with the mean of the null statistics by segment.
nullWidths: is a numeric Rle with the length of each of the segments in the null distribution. The area can be obtained by multiplying the absolute nullstats by the corresponding lengths.
nullPermutation: is a Rle with the permutation number from which the null region originated from.

Author(s)

Leonardo Collado-Torres

Examples

## Collapse the coverage information
collapsedFull <- collapseFullCoverage(list(genomeData$coverage),
    verbose = TRUE
)

## Calculate library size adjustments
sampleDepths <- sampleDepth(collapsedFull, probs = c(0.5), verbose = TRUE)

## Build the models
group <- genomeInfo$pop
adjustvars <- data.frame(genomeInfo$gender)
models <- makeModels(sampleDepths, testvars = group, adjustvars = adjustvars)

## Preprocess the data
## Automatic chunksize used to then compare 1 vs 4 cores in the 'do not run'
## section
prep <- preprocessCoverage(genomeData,
    groupInfo = group, cutoff = 0,
    scalefac = 32, chunksize = NULL, colsubset = NULL, mc.cores = 4
)

## Get the F statistics
fstats <- genomeFstats

## We recommend determining the cutoff to use based on the F-distribution
## although you could also based it on the observed F-statistics.

## In this example we use a low cutoff used for illustrative purposes
cutoff <- 1

## Calculate the p-values and define the regions of interest.
regsWithP <- calculatePvalues(prep, models, fstats,
    nPermute = 1, seeds = 1,
    chr = "chr21", cutoff = cutoff, mc.cores = 1, method = "regular"
)
regsWithP
## Not run: 
## Calculate again, but with 10 permutations instead of just 1
regsWithP <- calculatePvalues(prep, models, fstats,
    nPermute = 10, seeds = 1:10,
    chr = "chr21", cutoff = cutoff, mc.cores = 2, method = "regular"
)

## Check that they are the same as the previously calculated regions
library(testthat)
expect_that(regsWithP, equals(genomeRegions))

## Histogram of the theoretical p-values by region
hist(pf(regsWithP$regions$value, df1 - df0, n - df1), main = "Distribution
    original p-values by region", freq = FALSE)

## Histogram of the permutted p-values by region
hist(regsWithP$regions$pvalues, main = "Distribution permutted p-values by
    region", freq = FALSE)

## MA style plot
library("ggplot2")
ma <- data.frame(
    mean = regsWithP$regions$meanCoverage,
    log2FoldChange = regsWithP$regions$log2FoldChangeYRIvsCEU
)
ggplot(ma, aes(x = log2(mean), y = log2FoldChange)) +
    geom_point() +
    ylab("Fold Change (log2)") +
    xlab("Mean coverage (log2)") +
    labs(title = "MA style plot")

## Annotate the results
library("bumphunter")
genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene)
annotation <- matchGenes(regsWithP$regions, genes)
head(annotation)

## End(Not run)

lcolladotor/derfinder documentation built on Dec. 17, 2024, 4:53 p.m.

lcolladotor/derfinder index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lcolladotor/derfinder
Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

calculatePvalues: Calculate p-values and identify regions
In lcolladotor/derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

Calculate p-values and identify regions

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to calculatePvalues in lcolladotor/derfinder...

R Package Documentation

Browse R Packages

We want your feedback!

lcolladotor/derfinder Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

calculatePvalues: Calculate p-values and identify regions In lcolladotor/derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

Calculate p-values and identify regions

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to calculatePvalues in lcolladotor/derfinder...

R Package Documentation

Browse R Packages

We want your feedback!

lcolladotor/derfinder
Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

calculatePvalues: Calculate p-values and identify regions
In lcolladotor/derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach