Home

/

GitHub

/

stemangiola/ppcseq

/

identify_outliers: identify_outliers main

identify_outliers: identify_outliers main
In stemangiola/ppcseq: Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models

View source: R/methods.R

identify_outliers

R Documentation

identify_outliers main

Description

This function runs the data modeling and statistical test for the hypothesis that a transcript includes outlier biological replicate.

\lifecycle

maturing

Usage

identify_outliers(
  .data,
  formula = ~1,
  .sample,
  .transcript,
  .abundance,
  .significance,
  .do_check,
  .scaling_factor = NULL,
  percent_false_positive_genes = 1,
  how_many_negative_controls = 500,
  approximate_posterior_inference = TRUE,
  approximate_posterior_analysis = TRUE,
  draws_after_tail = 10,
  save_generated_quantities = FALSE,
  additional_parameters_to_save = c(),
  cores = detect_cores(),
  pass_fit = FALSE,
  do_check_only_on_detrimental = length(parse_formula(formula)) > 0,
  tol_rel_obj = 0.01,
  just_discovery = FALSE,
  seed = sample(seq_len(length.out = 999999), size = 1),
  adj_prob_theshold_2 = NULL
)

Arguments

`.data`	A tibble including a transcript name column \| sample name column \| read counts column \| covariate columns \| Pvalue column \| a significance column
`formula`	A formula. The sample formula used to perform the differential transcript abundance analysis
`.sample`	A column name as symbol. The sample identifier
`.transcript`	A column name as symbol. The transcript identifier
`.abundance`	A column name as symbol. The transcript abundance (read count)
`.significance`	A column name as symbol. A column with the Pvalue, or other significance measure (preferred Pvalue over false discovery rate)
`.do_check`	A column name as symbol. A column with a boolean indicating whether a transcript was identified as differentially abundant
`.scaling_factor`	In case the scaling factor must not be calculated (TMM method) using the input data but provided. It is useful, for example, for pseudobulk single-cell where the scaling might depend on sample sequencing depth for all cells rather than a particular cell type.
`percent_false_positive_genes`	A real between 0 and 100. It is the aimed percent of transcript being a false positive. For example, percent_false_positive_genes = 1 provide 1 percent of the calls for outlier containing transcripts that has actually not outliers.
`how_many_negative_controls`	An integer. How many transcript from the bottom non-significant should be taken for inferring the mean-overdispersion trend.
`approximate_posterior_inference`	A boolean. Whether the inference of the joint posterior distribution should be approximated with variational Bayes It confers execution time advantage.
`approximate_posterior_analysis`	A boolean. Whether the calculation of the credible intervals should be done semi-analytically, rather than with pure sampling from the posterior. It confers execution time and memory advantage.
`draws_after_tail`	An integer. How many draws should on average be after the tail, in a way to inform CI.
`save_generated_quantities`	A boolean. Used for development and testing purposes
`additional_parameters_to_save`	A character vector. Used for development and testing purposes
`cores`	An integer. How many cored to be used with parallel calculations.
`pass_fit`	A boolean. Used for development and testing purposes
`do_check_only_on_detrimental`	A boolean. Whether to test only for detrimental outliers (same direction as the fold change). It allows to test for less transcript/sample pairs and therefore higher the probability threshold.
`tol_rel_obj`	A real. Used for development and testing purposes
`just_discovery`	A boolean. Used for development and testing purposes
`seed`	An integer. Used for development and testing purposes
`adj_prob_theshold_2`	A boolean. Used for development and testing purposes

Value

A nested tibble tbl with transcript-wise information: sample_wise_data | plot | ⁠ppc samples failed⁠ | ⁠tot deleterious_outliers⁠

Examples


library(dplyr)

data("counts")

if(Sys.info()[['sysname']] == "Linux")
result =
  counts %>%
  dplyr::mutate(  is_significant = ifelse(symbol %in% c("SLC16A12", "CYP1A1", "ART3"), TRUE, FALSE) ) %>%
 ppcseq::identify_outliers(
	formula = ~ Label,
	sample, symbol, value,
	.significance = PValue,
	.do_check  = is_significant,
	percent_false_positive_genes = 1,
	tol_rel_obj = 0.01,
	approximate_posterior_inference =TRUE,
	approximate_posterior_analysis =TRUE,
	how_many_negative_controls = 50,
	cores=1
)

stemangiola/ppcseq documentation built on Sept. 21, 2023, 7:19 a.m.

stemangiola/ppcseq index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

stemangiola/ppcseq
Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models

identify_outliers: identify_outliers main
In stemangiola/ppcseq: Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models

identify_outliers main

Description

Usage

Arguments

Value

Examples

Related to identify_outliers in stemangiola/ppcseq...

R Package Documentation

Browse R Packages

We want your feedback!

stemangiola/ppcseq Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models

identify_outliers: identify_outliers main In stemangiola/ppcseq: Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models

identify_outliers main

Description

Usage

Arguments

Value

Examples

Related to identify_outliers in stemangiola/ppcseq...

R Package Documentation

Browse R Packages

We want your feedback!

stemangiola/ppcseq
Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models

identify_outliers: identify_outliers main
In stemangiola/ppcseq: Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models