G_correction: Correct G artifact

G_correctionR Documentation

Correct G artifact

Description

Correct overrepresentation of 5' G bases added during reverse transcription.

Usage

G_correction(experiment, assembly = NULL)

Arguments

experiment

TSRexploreR object.

assembly

Genome assembly in FASTA or BSgenome format.

Details

A common artifact in most TSS mapping methods is the presence of a G base upstream of the true TSS, presumably templated by the 5' cap during reverse transcription. Soft-clipping analysis can remove such Gs if they are not incidentally templated onto the genome; however, in cases where they match the genome during alignment, they cannot be distinguished from true TSSs. In order to account for this artifact, TSRexploreR first determines the frequency of reads with a soft-clipped G in a given sample. For each read with a non-soft-clipped G at its 5' end, a Bernoulli trial is performed, with the above-mentioned frequency used as the probability of "success" (removal of the 5' G).

Value

TSRexploreR object with G-corrected TSS GRanges.

See Also

import_bams to import BAMs.

Examples

bam_file <- system.file("extdata", "S288C.bam", package="TSRexploreR")
assembly <- system.file("extdata", "S288C_Assembly.fasta", package="TSRexploreR")
samples <- data.frame(sample_name="S288C", file_1=bam_file, file_2=NA)

exp <- tsr_explorer(sample_sheet=samples, genome_assembly=assembly) %>%
  import_bams(paired=TRUE)
  
exp <- G_correction(exp, assembly=assembly)


zentnerlab/TSRexploreR documentation built on Dec. 30, 2022, 10:27 p.m.