outliers_by_pool_fragments: Identify and flag outliers based on pool fragments.
In calabrialab/ISAnalytics: Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies

outliers_by_pool_fragments

R Documentation

Identify and flag outliers based on pool fragments.

Description

Identify and flag outliers based on expected number of raw reads per pool.

Usage

outliers_by_pool_fragments(
  metadata,
  key = "BARCODE_MUX",
  outlier_p_value_threshold = 0.01,
  normality_test = FALSE,
  normality_p_value_threshold = 0.05,
  transform_log2 = TRUE,
  per_pool_test = TRUE,
  pool_col = "PoolID",
  min_samples_per_pool = 5,
  flag_logic = "AND",
  keep_calc_cols = TRUE,
  report_path = default_report_path()
)

Arguments

`metadata`	The metadata data frame
`key`	A character vector of numeric column names
`outlier_p_value_threshold`	The p value threshold for a read to be considered an outlier
`normality_test`	Perform normality test? Normality is assessed for each column in the key using Shapiro-Wilk test and if the values do not follow a normal distribution, other calculations are skipped
`normality_p_value_threshold`	Normality threshold
`transform_log2`	Perform a log2 trasformation on values prior the actual calculations?
`per_pool_test`	Perform the test for each pool?
`pool_col`	A character vector of the names of the columns that uniquely identify a pool
`min_samples_per_pool`	The minimum number of samples that a pool needs to contain in order to be processed - relevant only if `per_pool_test = TRUE`
`flag_logic`	A character vector of logic operators to obtain a global flag formula - only relevant if the key is longer than one. All operators must be chosen between: AND, OR, XOR, NAND, NOR, XNOR
`keep_calc_cols`	Keep the calculation columns in the output data frame?
`report_path`	The path where the report file should be saved. Can be a folder, a file or NULL if no report should be produced. Defaults to `{user_home}/ISAnalytics_reports`.

Details

Modular structure

The outlier filtering functions are structured in a modular fashion. There are 2 kind of functions:

Outlier tests - Functions that perform some kind of calculation based on inputs and flags metadata
Outlier filter - A function that takes one or more outlier tests, combines all the flags with a given logic and filters out rows that are flagged as outliers

This function is an outlier test, and calculates for each column in the key

The zscore of the values
The tstudent of the values
The the associated p-value (tdist)

Optionally the test can be performed for each pool and a normality test can be run prior the actual calculations. Samples are flagged if this condition is respected:

tdist < outlier_p_value_threshold & zscore < 0

If the key contains more than one column an additional flag logic can be specified for combining the results. Example: let's suppose the key contains the names of two columns, X and Y key = c("X", "Y") if we specify the the argument flag_logic = "AND" then the reads will be flagged based on this global condition: (tdist_X < outlier_p_value_threshold & zscore_X < 0) AND (tdist_Y < outlier_p_value_threshold & zscore_Y < 0)

The user can specify one or more logical operators that will be applied in sequence.

Value

A data frame of metadata with the column to_remove

Examples

data("association_file", package = "ISAnalytics")
flagged <- outliers_by_pool_fragments(association_file,
    report_path = NULL
)
head(flagged)

calabrialab/ISAnalytics documentation built on Dec. 10, 2024, 10:50 p.m.

calabrialab/ISAnalytics index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

calabrialab/ISAnalytics
Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies

outliers_by_pool_fragments: Identify and flag outliers based on pool fragments.
In calabrialab/ISAnalytics: Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies

Identify and flag outliers based on pool fragments.

Description

Usage

Arguments

Details

Modular structure

Value

See Also

Examples

Related to outliers_by_pool_fragments in calabrialab/ISAnalytics...

R Package Documentation

Browse R Packages

We want your feedback!

calabrialab/ISAnalytics Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies

outliers_by_pool_fragments: Identify and flag outliers based on pool fragments. In calabrialab/ISAnalytics: Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies

Identify and flag outliers based on pool fragments.

Description

Usage

Arguments

Details

Modular structure

Value

See Also

Examples

Related to outliers_by_pool_fragments in calabrialab/ISAnalytics...

R Package Documentation

Browse R Packages

We want your feedback!

calabrialab/ISAnalytics
Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies

outliers_by_pool_fragments: Identify and flag outliers based on pool fragments.
In calabrialab/ISAnalytics: Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies