outlier_filter: Filter out outliers in metadata, identified by the chosen...

View source: R/outlier-filtering.R

outlier_filterR Documentation

Filter out outliers in metadata, identified by the chosen outlier test.

Description

[Experimental] Filter out outliers in metadata by using appropriate outlier tests.

Usage

outlier_filter(
  metadata,
  pcr_id_col = pcr_id_column(),
  outlier_test = c(outliers_by_pool_fragments),
  outlier_test_outputs = NULL,
  combination_logic = c("AND"),
  negate = FALSE,
  report_path = default_report_path(),
  ...
)

Arguments

metadata

The metadata data frame

pcr_id_col

The name of the pcr identifier column

outlier_test

One or more outlier tests. Must be functions, either from available_outlier_tests() or custom functions that produce an appropriate output format (see details).

outlier_test_outputs

NULL, a data frame or a list of data frames. See details.

combination_logic

One or more logical operators ("AND", "OR", "XOR", "NAND", "NOR", "XNOR"). See datails.

negate

If TRUE will return only the metadata that was flagged to be removed. If FALSE will return only the metadata that wasn't flagged to be removed.

report_path

The path where the report file should be saved. Can be a folder or NULL if no report should be produced. Defaults to {user_home}/ISAnalytics_reports.

...

Additional named arguments passed to outliers_test

Details

Modular structure

The outlier filtering functions are structured in a modular fashion. There are 2 kind of functions:

  • Outlier tests - Functions that perform some kind of calculation based on inputs and flags metadata

  • Outlier filter - A function that takes one or more outlier tests, combines all the flags with a given logic and filters out rows that are flagged as outliers

This function acts as the filter. It can either take one or more outlier tests as functions and call them through the argument outlier_test, or it can take directly outputs produced by individual tests in the argument outlier_test_outputs - if both are provided the second one has priority. The second method offers a bit more freedom, since single tests can be run independently and intermediate results saved and examined more in detail. If more than one test is to be performed, the argument combination_logic tells the function how to combine the flags: you can specify 1 logical operator or more than 1, provided it is compatible with the number of tests.

Writing custom outlier tests

You have the freedom to provide your own functions as outlier tests. For this purpose, functions provided must respect this guidelines:

  • Must take as input the whole metadata df

  • Must return a df containing AT LEAST the pcr_id_col and a logical column "to_remove" that contains the flag

  • The pcr_id_col must contain all the values originally present in the metadata df

Value

A data frame of metadata which has less or the same amount of rows

See Also

Other Data cleaning and pre-processing: aggregate_metadata(), aggregate_values_by_key(), compute_near_integrations(), default_meta_agg(), outliers_by_pool_fragments(), purity_filter(), realign_after_collisions(), remove_collisions(), threshold_filter()

Examples

data("association_file", package = "ISAnalytics")
filtered_af <- outlier_filter(association_file,
    key = "BARCODE_MUX",
    report_path = NULL
)
head(filtered_af)

calabrialab/ISAnalytics documentation built on Dec. 10, 2024, 10:50 p.m.