View source: R/calculatePvalues.R
calculatePvalues  R Documentation 
First, this function finds the regions of interest according to specified cutoffs. Then it permutes the samples and recalculates the Fstatistics. The area of the statistics from these segments are then used to calculate pvalues for the original regions.
calculatePvalues(
coveragePrep,
models,
fstats,
nPermute = 1L,
seeds = as.integer(gsub("", "", Sys.Date())) + seq_len(nPermute),
chr,
cutoff = quantile(fstats, 0.99, na.rm = TRUE),
significantCut = c(0.05, 0.1),
lowMemDir = NULL,
smooth = FALSE,
weights = NULL,
smoothFunction = bumphunter::locfitByCluster,
...
)
coveragePrep 
A list with 
models 
A list with 
fstats 
A numerical Rle with the Fstatistics normally generated using calculateStats. 
nPermute 
The number of permutations. Note that for a full chromosome,
a small amount (10) of permutations is sufficient. If set to 0, no
permutations are performed and thus no null regions are used, however, the

seeds 
An integer vector of length 
chr 
A single element character vector specifying the chromosome name. This argument is passed to findRegions. 
cutoff 
Fstatistic cutoff to use to determine segments. 
significantCut 
A vector of length two specifiying the cutoffs used to determine significance. The first element is used to determine significance for the Pvalues, while the second element is used for the Qvalues (FDR adjusted Pvalues). 
lowMemDir 
The directory where the processed chunks are saved when
using preprocessCoverage with a specified 
smooth 
Whether to smooth the Fstatistics ( 
weights 
Weights used by the smoother as described in smoother. 
smoothFunction 
A function to be used for smoothing the Fstatistics.
Two functions are provided by the 
... 
Arguments passed to other methods and/or advanced arguments. Advanced arguments:
Passed to findRegions, 
A list with four components:
is a GRanges with metadata columns given by
findRegions with the additional metadata column pvalues
:
pvalue of the region calculated via permutations of the samples;
qvalues
: the qvalues calculated using qvalue;
significant
: whether the pvalue is less than 0.05 (by default);
significantQval
: whether the qvalue is less than 0.10 (by default).
It also includes the mean coverage of the region (mean from the mean
coverage at each base calculated in preprocessCoverage). Furthermore,
if groupInfo
was not NULL
in preprocessCoverage, then
the group mean coverage is calculated as well as the log 2 fold change
(using group 1 as the reference).
is a numeric Rle with the mean of the null statistics by segment.
is a numeric Rle with the length of each of the segments
in the null distribution. The area can be obtained by multiplying the
absolute nullstats
by the corresponding lengths.
is a Rle with the permutation number from which the null region originated from.
Leonardo ColladoTorres
findRegions, fstats.apply, qvalue
## Collapse the coverage information
collapsedFull < collapseFullCoverage(list(genomeData$coverage),
verbose = TRUE
)
## Calculate library size adjustments
sampleDepths < sampleDepth(collapsedFull, probs = c(0.5), verbose = TRUE)
## Build the models
group < genomeInfo$pop
adjustvars < data.frame(genomeInfo$gender)
models < makeModels(sampleDepths, testvars = group, adjustvars = adjustvars)
## Preprocess the data
## Automatic chunksize used to then compare 1 vs 4 cores in the 'do not run'
## section
prep < preprocessCoverage(genomeData,
groupInfo = group, cutoff = 0,
scalefac = 32, chunksize = NULL, colsubset = NULL, mc.cores = 4
)
## Get the F statistics
fstats < genomeFstats
## We recommend determining the cutoff to use based on the Fdistribution
## although you could also based it on the observed Fstatistics.
## In this example we use a low cutoff used for illustrative purposes
cutoff < 1
## Calculate the pvalues and define the regions of interest.
regsWithP < calculatePvalues(prep, models, fstats,
nPermute = 1, seeds = 1,
chr = "chr21", cutoff = cutoff, mc.cores = 1, method = "regular"
)
regsWithP
## Not run:
## Calculate again, but with 10 permutations instead of just 1
regsWithP < calculatePvalues(prep, models, fstats,
nPermute = 10, seeds = 1:10,
chr = "chr21", cutoff = cutoff, mc.cores = 2, method = "regular"
)
## Check that they are the same as the previously calculated regions
library(testthat)
expect_that(regsWithP, equals(genomeRegions))
## Histogram of the theoretical pvalues by region
hist(pf(regsWithP$regions$value, df1  df0, n  df1), main = "Distribution
original pvalues by region", freq = FALSE)
## Histogram of the permutted pvalues by region
hist(regsWithP$regions$pvalues, main = "Distribution permutted pvalues by
region", freq = FALSE)
## MA style plot
library("ggplot2")
ma < data.frame(
mean = regsWithP$regions$meanCoverage,
log2FoldChange = regsWithP$regions$log2FoldChangeYRIvsCEU
)
ggplot(ma, aes(x = log2(mean), y = log2FoldChange)) +
geom_point() +
ylab("Fold Change (log2)") +
xlab("Mean coverage (log2)") +
labs(title = "MA style plot")
## Annotate the results
library("bumphunter")
genes < annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene)
annotation < matchGenes(regsWithP$regions, genes)
head(annotation)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.