View source: R/duplicates_filter.R
duplicates_filter | R Documentation |
This function provides multiple options for remove duplicated reads: when two or more reads are marked as duplicates, all of them are discarded but one.
duplicates_filter(
data,
sample = NULL,
extremity = "both",
keep = "shortest",
output_class = "datatable",
txt = FALSE,
txt_file = NULL
)
data |
Either list of data tables or GRangesList object from
|
sample |
Character string or character string vector specifying the name of the sample(s) to process. Default is NULL i.e. all samples are processed. |
extremity |
Either "both", "5end", "3end". It specifies the criterion to
define which reads should be considered duplicates. Reads are marked as
duplicates if they map on the same transcript and share: both the 5'
estremity and the 3' extremity ("both"), only the 5' extremity ("5end"),
only the same 3' extremity ("3end "). For "5end" and "3end", reads of
different lengths can be marked as duplicates. See |
keep |
Either "shortest" or "longest". It specifies wheter to keep the
shortest or the longest read when duplicates display different lengths.
This parameter is considered only if |
output_class |
Either "datatable" or "granges". It specifies the format of the output i.e. a list of data tables or a GRangesList object. Default is "datatable". |
txt |
Logical value whether to write in a txt file statistics on the filtering step. Similar information are displayed by default in the console. Default is FALSE. |
txt_file |
Character string specifying the path, name and extension
(e.g. "PATH/NAME.extension") of the plain text file where statistics on the
filtering step shuold be written. If the specified folder doesn't exist, it
is automatically created. If NULL (the default), the information are
written in "duplicates_filtering.txt", saved in the working
directory. This parameter is considered only if |
A list of data tables or a GRangesList object.
#generate an \emph{ad hoc} dataset:
library(data.table)
dt <- data.table(transcript = rep("ENSMUST00000000001.4", 6),
end5 = c(92, 92, 92, 94, 94, 95),
end3 = c(119, 119, 122, 122, 123, 123)
)[, length := end3 - end5 + 1
][, cds_start := 14
][, cds_stop := 1206]
example_reads_list <- list()
example_reads_list[["Samp_example"]] <- dt
## Reads are duplicates if they share both the 5' estremity and the
## 3' extremity:
filtered_list <- duplicates_filter(example_reads_list,
extremity = "both")
## Reads are duplicates if they only share the 5' estremity. Among duplicated
## reads we keep the shortes one:
filtered_list <- duplicates_filter(example_reads_list,
extremity = "5end",
keep = "shortest")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.