Description Usage Arguments Value Author(s) See Also Examples
View source: R/calculatePvalues.R
First, this function finds the regions of interest according to specified cutoffs. Then it permutes the samples and re-calculates the F-statistics. The area of the statistics from these segments are then used to calculate p-values for the original regions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | calculatePvalues(
coveragePrep,
models,
fstats,
nPermute = 1L,
seeds = as.integer(gsub("-", "", Sys.Date())) + seq_len(nPermute),
chr,
cutoff = quantile(fstats, 0.99, na.rm = TRUE),
significantCut = c(0.05, 0.1),
lowMemDir = NULL,
smooth = FALSE,
weights = NULL,
smoothFunction = bumphunter::locfitByCluster,
...
)
|
coveragePrep |
A list with |
models |
A list with |
fstats |
A numerical Rle with the F-statistics normally generated using calculateStats. |
nPermute |
The number of permutations. Note that for a full chromosome,
a small amount (10) of permutations is sufficient. If set to 0, no
permutations are performed and thus no null regions are used, however, the
|
seeds |
An integer vector of length |
chr |
A single element character vector specifying the chromosome name. This argument is passed to findRegions. |
cutoff |
F-statistic cutoff to use to determine segments. |
significantCut |
A vector of length two specifiying the cutoffs used to determine significance. The first element is used to determine significance for the P-values, while the second element is used for the Q-values (FDR adjusted P-values). |
lowMemDir |
The directory where the processed chunks are saved when
using preprocessCoverage with a specified |
smooth |
Whether to smooth the F-statistics ( |
weights |
Weights used by the smoother as described in smoother. |
smoothFunction |
A function to be used for smoothing the F-statistics.
Two functions are provided by the |
... |
Arguments passed to other methods and/or advanced arguments. Advanced arguments:
Passed to findRegions, |
A list with four components:
is a GRanges with metadata columns given by
findRegions with the additional metadata column pvalues
:
p-value of the region calculated via permutations of the samples;
qvalues
: the qvalues calculated using qvalue;
significant
: whether the p-value is less than 0.05 (by default);
significantQval
: whether the q-value is less than 0.10 (by default).
It also includes the mean coverage of the region (mean from the mean
coverage at each base calculated in preprocessCoverage). Furthermore,
if groupInfo
was not NULL
in preprocessCoverage, then
the group mean coverage is calculated as well as the log 2 fold change
(using group 1 as the reference).
is a numeric Rle with the mean of the null statistics by segment.
is a numeric Rle with the length of each of the segments
in the null distribution. The area can be obtained by multiplying the
absolute nullstats
by the corresponding lengths.
is a Rle with the permutation number from which the null region originated from.
Leonardo Collado-Torres
findRegions, fstats.apply, qvalue
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | ## Collapse the coverage information
collapsedFull <- collapseFullCoverage(list(genomeData$coverage),
verbose = TRUE
)
## Calculate library size adjustments
sampleDepths <- sampleDepth(collapsedFull, probs = c(0.5), verbose = TRUE)
## Build the models
group <- genomeInfo$pop
adjustvars <- data.frame(genomeInfo$gender)
models <- makeModels(sampleDepths, testvars = group, adjustvars = adjustvars)
## Preprocess the data
## Automatic chunksize used to then compare 1 vs 4 cores in the 'do not run'
## section
prep <- preprocessCoverage(genomeData,
groupInfo = group, cutoff = 0,
scalefac = 32, chunksize = NULL, colsubset = NULL, mc.cores = 4
)
## Get the F statistics
fstats <- genomeFstats
## We recommend determining the cutoff to use based on the F-distribution
## although you could also based it on the observed F-statistics.
## In this example we use a low cutoff used for illustrative purposes
cutoff <- 1
## Calculate the p-values and define the regions of interest.
regsWithP <- calculatePvalues(prep, models, fstats,
nPermute = 1, seeds = 1,
chr = "chr21", cutoff = cutoff, mc.cores = 1, method = "regular"
)
regsWithP
## Not run:
## Calculate again, but with 10 permutations instead of just 1
regsWithP <- calculatePvalues(prep, models, fstats,
nPermute = 10, seeds = 1:10,
chr = "chr21", cutoff = cutoff, mc.cores = 2, method = "regular"
)
## Check that they are the same as the previously calculated regions
library(testthat)
expect_that(regsWithP, equals(genomeRegions))
## Histogram of the theoretical p-values by region
hist(pf(regsWithP$regions$value, df1 - df0, n - df1), main = "Distribution
original p-values by region", freq = FALSE)
## Histogram of the permutted p-values by region
hist(regsWithP$regions$pvalues, main = "Distribution permutted p-values by
region", freq = FALSE)
## MA style plot
library("ggplot2")
ma <- data.frame(
mean = regsWithP$regions$meanCoverage,
log2FoldChange = regsWithP$regions$log2FoldChangeYRIvsCEU
)
ggplot(ma, aes(x = log2(mean), y = log2FoldChange)) +
geom_point() +
ylab("Fold Change (log2)") +
xlab("Mean coverage (log2)") +
labs(title = "MA style plot")
## Annotate the results
library("bumphunter")
genes <- annotateTranscripts(TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene)
annotation <- matchGenes(regsWithP$regions, genes)
head(annotation)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.