identify_outliers | R Documentation |
This function runs the data modeling and statistical test for the hypothesis that a transcript includes outlier biological replicate.
\lifecyclematuring
identify_outliers(
.data,
formula = ~1,
.sample,
.transcript,
.abundance,
.significance,
.do_check,
.scaling_factor = NULL,
percent_false_positive_genes = 1,
how_many_negative_controls = 500,
approximate_posterior_inference = TRUE,
approximate_posterior_analysis = TRUE,
draws_after_tail = 10,
save_generated_quantities = FALSE,
additional_parameters_to_save = c(),
cores = detect_cores(),
pass_fit = FALSE,
do_check_only_on_detrimental = length(parse_formula(formula)) > 0,
tol_rel_obj = 0.01,
just_discovery = FALSE,
seed = sample(seq_len(length.out = 999999), size = 1),
adj_prob_theshold_2 = NULL
)
.data |
A tibble including a transcript name column | sample name column | read counts column | covariate columns | Pvalue column | a significance column |
formula |
A formula. The sample formula used to perform the differential transcript abundance analysis |
.sample |
A column name as symbol. The sample identifier |
.transcript |
A column name as symbol. The transcript identifier |
.abundance |
A column name as symbol. The transcript abundance (read count) |
.significance |
A column name as symbol. A column with the Pvalue, or other significance measure (preferred Pvalue over false discovery rate) |
.do_check |
A column name as symbol. A column with a boolean indicating whether a transcript was identified as differentially abundant |
.scaling_factor |
In case the scaling factor must not be calculated (TMM method) using the input data but provided. It is useful, for example, for pseudobulk single-cell where the scaling might depend on sample sequencing depth for all cells rather than a particular cell type. |
percent_false_positive_genes |
A real between 0 and 100. It is the aimed percent of transcript being a false positive. For example, percent_false_positive_genes = 1 provide 1 percent of the calls for outlier containing transcripts that has actually not outliers. |
how_many_negative_controls |
An integer. How many transcript from the bottom non-significant should be taken for inferring the mean-overdispersion trend. |
approximate_posterior_inference |
A boolean. Whether the inference of the joint posterior distribution should be approximated with variational Bayes It confers execution time advantage. |
approximate_posterior_analysis |
A boolean. Whether the calculation of the credible intervals should be done semi-analytically, rather than with pure sampling from the posterior. It confers execution time and memory advantage. |
draws_after_tail |
An integer. How many draws should on average be after the tail, in a way to inform CI. |
save_generated_quantities |
A boolean. Used for development and testing purposes |
additional_parameters_to_save |
A character vector. Used for development and testing purposes |
cores |
An integer. How many cored to be used with parallel calculations. |
pass_fit |
A boolean. Used for development and testing purposes |
do_check_only_on_detrimental |
A boolean. Whether to test only for detrimental outliers (same direction as the fold change). It allows to test for less transcript/sample pairs and therefore higher the probability threshold. |
tol_rel_obj |
A real. Used for development and testing purposes |
just_discovery |
A boolean. Used for development and testing purposes |
seed |
An integer. Used for development and testing purposes |
adj_prob_theshold_2 |
A boolean. Used for development and testing purposes |
A nested tibble tbl
with transcript-wise information: sample_wise_data
| plot | ppc samples failed
| tot deleterious_outliers
library(dplyr)
data("counts")
if(Sys.info()[['sysname']] == "Linux")
result =
counts %>%
dplyr::mutate( is_significant = ifelse(symbol %in% c("SLC16A12", "CYP1A1", "ART3"), TRUE, FALSE) ) %>%
ppcseq::identify_outliers(
formula = ~ Label,
sample, symbol, value,
.significance = PValue,
.do_check = is_significant,
percent_false_positive_genes = 1,
tol_rel_obj = 0.01,
approximate_posterior_inference =TRUE,
approximate_posterior_analysis =TRUE,
how_many_negative_controls = 50,
cores=1
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.