remove_collisions: Identifies and removes collisions.

View source: R/collision-removal.R

remove_collisionsR Documentation

Identifies and removes collisions.

Description

[Stable] A collision is an integration (aka a unique combination of the provided mandatory_IS_vars()) which is observed in more than one independent sample. The function tries to decide to which independent sample should an integration event be assigned to, and if no decision can be taken, the integration is completely removed from the data frame. For more details refer to the vignette "Collision removal functionality": vignette("workflow_start", package = "ISAnalytics")

Usage

remove_collisions(
  x,
  association_file,
  independent_sample_id = c("ProjectID", "SubjectID"),
  date_col = "SequencingDate",
  reads_ratio = 10,
  quant_cols = c(seqCount = "seqCount", fragmentEstimate = "fragmentEstimate"),
  report_path = default_report_path(),
  max_workers = NULL
)

Arguments

x

Either a multi-quantification matrix (recommended) or a named list of matrices (names must be quantification types)

association_file

The association file imported via import_association_file()

independent_sample_id

A character vector of column names that identify independent samples

date_col

The date column that should be considered.

reads_ratio

A single numeric value that represents the ratio that has to be considered when deciding between seqCount value.

quant_cols

A named character vector where names are quantification types and values are the names of the corresponding columns. The quantification seqCount MUST be included in the vector.

report_path

The path where the report file should be saved. Can be a folder or NULL if no report should be produced. Defaults to {user_home}/ISAnalytics_reports.

max_workers

Maximum number of parallel workers to distribute the workload. If NULL (default) produces the maximum amount of workers allowed, a numeric value is requested otherwise. WARNING: a higher number of workers speeds up computation at the cost of memory consumption! Tune this parameter accordingly.

Value

Either a multi-quantification matrix or a list of data frames

Required tags

The function will explicitly check for the presence of these tags:

  • project_id

  • pool_id

  • pcr_replicate

See Also

Other Data cleaning and pre-processing: aggregate_metadata(), aggregate_values_by_key(), compute_near_integrations(), default_meta_agg(), outlier_filter(), outliers_by_pool_fragments(), purity_filter(), realign_after_collisions(), threshold_filter()

Examples

data("integration_matrices", package = "ISAnalytics")
data("association_file", package = "ISAnalytics")
no_coll <- remove_collisions(
    x = integration_matrices,
    association_file = association_file,
    report_path = NULL
)
head(no_coll)

calabrialab/ISAnalytics documentation built on Dec. 10, 2024, 10:50 p.m.