BBUM_DEcorr | R Documentation |
Process DE results data tables. Designed for DESeq2 output but any similar
tables with all appropriate columns are applicable. Out of two subsets of
data rows, one set ("signal/sample set") with primary effects and one
("background set") without, BBUM_DEcorr
models the p values to
distinguish secondary effects signal from primary effects signal using the
BBUM model. It then calls significant genes within the signal set after
correcting for FDR of both null and secondary effects in multiple
testing.
BBUM_DEcorr(
df.deseq,
classCol,
geneName = NULL,
pBBUM.alpha = 0.05,
excluded = c(),
outliers = c(),
add_starts = list(),
only_start = FALSE,
limits = list(),
auto_outliers = TRUE,
rthres = 1,
rtrimmax = 0.05,
atrimmax = 10,
two_tailed = FALSE,
quiet = FALSE
)
df.deseq |
A DE results |
classCol |
A |
geneName |
A |
pBBUM.alpha |
Cutoff level of |
excluded |
A |
outliers |
A |
add_starts |
List of named vectors for additional starts of fitting algorithm beyond the default set. |
only_start |
Whether the algorithm should only use the given starts
( |
limits |
Named list of custom limits for specific paramters. Parameters not mentioned would be default values. |
auto_outliers |
Toggle automatic outlier trimming. |
rthres |
Threshold value of |
rtrimmax |
Maximum fraction of data points allowed to be outliers in the background set of data (to be trimmed). |
atrimmax |
Maximum absolute number of data points allowed to be outliers in the background set of data (to be trimmed). |
two_tailed |
Toggle the "two-tailed" case of BBUM correction, if the background assumption is weak and bona fide hits in the background class are relevant. See Details. Default behavior is off. |
quiet |
Suppress printed messages and warnings. |
BBUM_DEcorr
can also be used on arbitrary data frames other
those generated by DESeq2
(e.g. by EdgeR
), as long as three
necessary columns are present: data points names (geneName
), p
values (pvalue
), and data set identifier (classCol
).
The output data.frame
preserves all original rows in order,
all original columns in order, and row names if present.
classCol
: For example, in the case where positive log fold
changes represent the signal class, and negative log fold changes represent
the background class, all rows of positive fold change points have
classCol == T
, and all rows of negative fold change points have
classCol == F
.
pBBUM
represents the expected overall FDR level if the cutoff
were set at that particular p value. This is similar to the interpretation
of p values corrected through the typical p.adjust(method = "fdr")
.
pBBUM
values are designed for the signal set p values only,
Values for the background set are given but not valid as significance
testing adjustment, and so should not be used to call any hits. They
are provided primarily to compare the equivalent transformation against the
signal set to assess the adjustment strategy. The background set should
not be considered for hits.
BBUM_DEcorr
functions properly only with p values filtered
for poor quality data points, e.g. low read counts. Filtering has either
been done manually in prior, through DESeq2's independentFiltering
in results()
, or through the excluded
argument. Excluding a
point via excluded
is equivalent to filtering it out of the
data.frame to begin with. Such points tend to have high p values and
may disrupt the uniform null distribution.
Outliers (outliers
) are included in the graphs from
BBUM_plot()
and have p values associated. Excluded points
(excluded
) are removed from analysis altogether.
Default starts for BBUM fitting are implemented. If additional
starts should included, or only custom starts should be considered, make
use of add_starts
and/or only_start
arguments.
If more than one start achieved the identical likelihood value, A random start is chosen among them.
Automatic outlier detection relies on the model fitting a value of
r > 1
. Such a result suggests that a stronger signal (presumably
outliers) exists in the background set than in the signal set, which
violates the assumptions of the model. This is a conservative strategy.
The ideal way to deal with outliers is to identify and handle them before
any statistical analyses. For benchmarking of the trimming strategy, see
Wang & Bartel, 2022.
Adding too many starts or allowing too much outlier trimming can increase computation time.
If the background assumption is weak, such that a small number
of bona fide hits are anticipated and relevant to the hypothesis at
hand among the data points designated "background class", the FDR could be
made to include the background class. This is akin to a two-tailed test
(despite a one-tailed assumption to begin with). This would allow the
generation of genuine FDR-corrected p values for the background class
points as well. Toggle this using the two_tailed
value.
Due to the asymptotic behavior of the function when any
p values = 0, any p values < .Machine$double.xmin*10
would be
constrained to .Machine$double.xmin*10
.
A tibble
based on the input results table, with added
columns from BBUM correction and significance calling:
geneName
: Copy of extracted gene names (character
)
excluded
: Whether the point is considered failing the cutoff and
excluded in excluded
(logical
)
outlier
: Whether the point is considered an outlier (logical
)
BBUM.class
: Whether the data point is considered in the signal set
instead of the background set (logical
)
BBUM.l
, BBUM.a
, BBUM.th
, BBUM.r
: Coefficients
from BBUM model fit (numeric
)
pBBUM
: BBUM-transformed, FDR-adusted p value (numeric
)
BBUM.hit
: Whether a gene is a significant hit (logical
)
BBUM.fct
: Factor summarizing status as hits, non-hits, or outliers
(factor
)
## Not run:
# Typical default run
# DESeq2
dds = DESeqDataSetFromMatrix(countData = cts, ...) # DESeq2
dds = DESeq(dds)
res = results(dds) %>%
as.data.frame() %>%
mutate(FCup = log2FoldChange > 0)
# Call bbum function
res.BBUMcorr = BBUM_DEcorr(
df.deseq = res,
classCol = "FCup"
)
# Or with some customized behavior
res.BBUMcorr = BBUM_DEcorr(
df.deseq = res,
classCol = "FCup",
geneName = "miRNAs",
pBBUM.alpha = 0.01,
excluded = c("hsa-miR-124-5p", "hsa-miR-1-3p"),
outliers = c("hsa-miR-21-5p"),
add_starts = list(c(lambda = 0.9, a = 0.6, theta = 0.1, r = 0.1))
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.