mark_stat_hits | R Documentation |
Mark statistical hits by threshold cutoffs
mark_stat_hits(
x,
adjp_cutoff = NULL,
p_cutoff = NULL,
fold_cutoff = NULL,
mgm_cutoff = NULL,
ave_cutoff = NULL,
adjp_colname = "adj.P.Val",
p_colname = "P.Value",
logfc_colname = "logFC",
mgm_colname = "mgm",
ave_colname = "AveExpr",
assign_value = c("sign", "fold", "logfc"),
verbose = FALSE,
...
)
x |
|
adjp_cutoff , p_cutoff , fold_cutoff , mgm_cutoff , ave_cutoff |
|
adjp_colname , p_colname , logfc_colname , mgm_colname , ave_colname |
|
assign_value |
|
verbose |
|
... |
additional arguments are ignored. |
This function is lightweight method of applying one or more statistical thresholds to define "statistical hits". The thresholds are based upon three questions:
Is it "detected"? (above minimum signal)
Is it "changing"? (above minimum fold change)
Is it "significant"? (below defined P-value threshold)
The reasoning is roughly described:
If the signal for a measurement is not above a noise threshold, or above a defined level of signature required for adequate confirmation experiments, the other statistical measurements are not relevant.
If the change between two experimental groups is not sufficient for follow-up experiments, or is below a biologically meaningful level of change, the other statistical measures are not relevant.
If the signal is detected, and the change is potentially sufficient for follow-up confirmation experiments, and/or to induce biologically meaningful effects, it must also be statistically robust as defined by the relevant adjusted P-value.
These thresholds are dependent upon the experiment itself, and each threshold, if used, must be well-defined and defensible.
That is, in order to define a signal threshold, one should evaluate the level of noise below which a measured value is no longer sufficient for follow-up experiments, or no longer reliable based upon the technology being used.
In order to impost a minimum fold change threshold, one should have some clear indication of any limitations in follow-up assay techniques, and some indication of the magnitude of change expected for a biologically meaningful response. In some cases, a biologically meaningful change may be defined in other experiments, ideally showing small changes not associated with biologically meaningful effects and changes which are associated with biollogically meaningful effects.
A useful technique to review statistical thresholds is a volcano plot, which depicts the relationship of log fold change versus adjusted P-value. The plot can indicate the range of fold changes for which the statistical model found significance. Some technologies or protocols naturally compress the effective fold change, yielding a very narrow volcano plot, while others with high variability may result in a relatively short and wide volcano plot. The range of fold changes with no significant P-value may indicate a reasonable expectation for inherent variability, thus a fold change threshold may be defined above that observed for the majority of non-significant entries.
Entries which meet the statistical criteria are marked:
-1
for entries that meet all criteria, with negative fold change
0
for entries that do not meet all thresholds
1
for entries that meet all criteria, with positive fold change
"Detected" is defined either by the "max group mean", representing
the highest group mean signal intensity, or by "average signal",
representing the average group mean signal intensity across all
groups. These columns should already be present in the input
data x
.
"Changing" is defined by the log2 fold change, which must
meet the criteria defined by fold_cutoff
which is in normal
space. For example, fold_cutoff=1.5
represents 1.5-fold change,
and would be applied abs(log2fc) >= log2(fold_cutoff)
. Most
statistical results are reported using log2 fold change, but
scientists usually define fold changes in normal space.
"Significant" is define using the P-value, and/or adjusted P-value, and requires entries to be at or below the threshold.
numeric
vector with length nrow(x)
and values
defined by argument assign_value
. The order is identical
to the order of rows in x
input. The output vector
will be named by rownames(x)
if rownames exist.
Other jamses utilities:
choose_annotation_colnames()
,
contrast2comp_dev()
,
fold_to_log2fold()
,
intercalate()
,
list2im_opt()
,
log2fold_to_fold()
,
make_block_arrow_polygon()
,
matrix_normalize()
,
point_handedness()
,
point_slope_intercept()
,
shortest_unique_abbreviation()
,
shrinkDataFrame()
,
shrink_df()
,
shrink_matrix()
,
sort_samples()
,
strsplitOrdered()
,
sub_split_vector()
,
update_function_params()
,
update_list_elements()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.