strandInvaders: Detect and remove strand invasion artefacts

Strand invadersR Documentation

Detect and remove strand invasion artefacts

Description

findStrandInvaders detects strand invasion artefacts in the CTSS data. removeStrandInvaders removes them.

Strand invaders are artefacts produced by template switching reactions used in methods such as nanoCAGE and its derivatives (C1 CAGE, ...). They are described in details in Tang et al., 2013. Briefly, these artefacts create CAGE-like signal downstream of genome sequences highly similar to the tail of template-switching oligonucleotides, which is TATAGGG in recent (2017) nanoCAGE protocols. Since these artefacts represent truncated cDNAs, they do not indicate promoter regions. It is therefore advisable to remove these artefacts. Moreover, when a sample barcode is near the linker sequence (which is not the case in recent nanoCAGE protocols), the strand-invasion artefacts can produce sample-specific biases, which can be confounded with biological effects depending on how the barcode sequences were chosen. A barcode parameter is provided to incorporate this information.

Usage

findStrandInvaders(object, distance = 1, barcode = NULL, linker = "TATAGGG")

removeStrandInvaders(object, distance = 1, barcode = NULL, linker = "TATAGGG")

## S4 method for signature 'CAGEexp'
findStrandInvaders(object, distance = 1, barcode = NULL, linker = "TATAGGG")

## S4 method for signature 'CAGEexp'
removeStrandInvaders(object, distance = 1, barcode = NULL, linker = "TATAGGG")

## S4 method for signature 'CTSS'
findStrandInvaders(object, distance = 1, barcode = NULL, linker = "TATAGGG")

## S4 method for signature 'CTSS'
removeStrandInvaders(object, distance = 1, barcode = NULL, linker = "TATAGGG")

Arguments

object

A CAGEexp object object containing CTSS data and the name of a reference genome.

distance

The maximal edit distance between the genome and linker sequences. Regardless this parameter, only a single mismatch is allowed in the last three bases of the linker.

barcode

A vector of sample barcode sequences, or the name of a column metadata of the CAGEexp object containing this information. (Not implemented yet)

linker

The sequence of the tail of the template-switching oligonucleotide, that will be matched with the genome sequence (defaults to TATAGGG).

Value

findStrandInvaders returns a logical-Rle vector indicating the position of the strand invaders in the input ranges.

With CTSS objects as input removeStrandInvaders returns the object after removing the CTSS positions identified as strand invaders. In the case of CAGEexp objects, a modified object is returned. Its sample metadata is also updated by creating a new strandInvaders column that indicates the number of molecule counts removed. This value is subtracted from the counts colum so that the total number of tags is still equal to librarySizes.

References

Tang et al., “Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching.” Nucleic Acids Res. 2013 Feb 1;41(3):e44. PubMed ID: 23180801, DOI: 10.1093/nar/gks112

Examples

# Note that these examples do not do much on the example data since it was
# not constructed using a protocol based using the template-switching method.

findStrandInvaders(exampleCAGEexp)
removeStrandInvaders(exampleCAGEexp)


charles-plessy/CAGEr documentation built on Nov. 4, 2023, 11:57 a.m.