duplicates_filter: Duplicates filtering.

View source: R/duplicates_filter.R

duplicates_filterR Documentation

Duplicates filtering.

Description

This function provides multiple options for remove duplicated reads: when two or more reads are marked as duplicates, all of them are discarded but one.

Usage

duplicates_filter(
  data,
  sample = NULL,
  extremity = "both",
  keep = "shortest",
  output_class = "datatable",
  txt = FALSE,
  txt_file = NULL
)

Arguments

data

Either list of data tables or GRangesList object from bamtolist, bedtolist or length_filter.

sample

Character string or character string vector specifying the name of the sample(s) to process. Default is NULL i.e. all samples are processed.

extremity

Either "both", "5end", "3end". It specifies the criterion to define which reads should be considered duplicates. Reads are marked as duplicates if they map on the same transcript and share: both the 5' estremity and the 3' extremity ("both"), only the 5' extremity ("5end"), only the same 3' extremity ("3end "). For "5end" and "3end", reads of different lengths can be marked as duplicates. See keep to choose which one should be kept.

keep

Either "shortest" or "longest". It specifies wheter to keep the shortest or the longest read when duplicates display different lengths. This parameter is considered only if extremity is set to "5end" or "3end". Default is "shortest".

output_class

Either "datatable" or "granges". It specifies the format of the output i.e. a list of data tables or a GRangesList object. Default is "datatable".

txt

Logical value whether to write in a txt file statistics on the filtering step. Similar information are displayed by default in the console. Default is FALSE.

txt_file

Character string specifying the path, name and extension (e.g. "PATH/NAME.extension") of the plain text file where statistics on the filtering step shuold be written. If the specified folder doesn't exist, it is automatically created. If NULL (the default), the information are written in "duplicates_filtering.txt", saved in the working directory. This parameter is considered only if txt is TRUE.

Value

A list of data tables or a GRangesList object.

Examples

#generate an \emph{ad hoc} dataset:
library(data.table)
dt <- data.table(transcript = rep("ENSMUST00000000001.4", 6),
                 end5 = c(92, 92, 92, 94, 94, 95),
                 end3 = c(119, 119, 122, 122, 123, 123)
                 )[, length := end3 - end5 + 1
                   ][, cds_start := 14
                    ][, cds_stop := 1206]
example_reads_list <- list()
example_reads_list[["Samp_example"]] <- dt

## Reads are duplicates if they share both the 5' estremity and the
## 3' extremity:
filtered_list <- duplicates_filter(example_reads_list,
                                   extremity = "both")

## Reads are duplicates if they only share the 5' estremity. Among duplicated 
## reads we keep the shortes one:
filtered_list <- duplicates_filter(example_reads_list,
                                   extremity = "5end",
                                   keep = "shortest")

LabTranslationalArchitectomics/riboWaltz documentation built on Jan. 17, 2024, 12:18 p.m.