Mappability-methods: Calculate low mappability genomic regions

Mappability-methodsR Documentation

Calculate low mappability genomic regions

Description

These functions empirically calculate low-mappability (Mappability Exclusion) regions using the given genome FASTA file. A splice-aware alignment software capable of aligning reads to the genome is required. See details and examples below.

Usage

Mappability_GenReads(
  reference_path,
  read_len = 70,
  read_stride = 10,
  error_pos = 35,
  verbose = TRUE,
  alt_fasta_file
)

Mappability_CalculateExclusions(
  reference_path,
  aligned_bam = file.path(reference_path, "Mappability", "Aligned.out.bam"),
  threshold = 4,
  n_threads = 1
)

Arguments

reference_path

The directory of the reference prepared by GetReferenceResource()

read_len

The nucleotide length of the synthetic reads

read_stride

The nucleotide distance between adjacent synthetic reads

error_pos

The position of the procedurally-generated nucleotide error from the start of each synthetic reads

verbose

Whether additional status messages are shown

alt_fasta_file

(Optional) The path to the user-supplied genome fasta file, if different to that found inside the resource subdirectory of the reference_path. If GetReferenceResource() has already been run, this parameter should be omitted.

aligned_bam

The BAM file of alignment of the synthetic reads generated by Mappability_GenReads(). Users should use a genome splice-aware aligner, preferably the same aligner used to align the samples in their experiment.

threshold

Genomic regions with this alignment read depth (or below) in the aligned synthetic read BAM are defined as low mappability regions.

n_threads

The number of threads used to calculate mappability exclusion regions from aligned bam file of synthetic reads.

Details

Creating a Mappability Exclusion BED file is a three-step process.

  • First, using Mappability_GenReads(), synthetic reads are systematically generated using the given genome contained within reference_path.

  • Second, an aligner such as STAR (preferably the same aligner used for the subsequent RNA-seq experiment) is required to align these reads to the source genome. Poorly mapped regions of the genome will be reflected by regions of low coverage depth.

  • Finally, the BAM file containing the aligned reads is analysed using Mappability_CalculateExclusions(), to identify low-mappability regions to compile the Mappability Exclusion BED file.

It is recommended to leave all parameters to their default settings. Regular users should only specify reference_path, aligned_bam and n_threads, as required.

NB: STAR_Mappability runs all 3 steps required, using the STAR aligner. This only works in systems where STAR is installed.

NB2: In systems where STAR is not available, consider using HISAT2 or Rsubread. A working example using Rsubread is shown below.

Value

  • For Mappability_GenReads: writes Reads.fa to the Mappability subdirectory inside the given reference_path.

  • For Mappability_CalculateExclusions: writes a gzipped BED file named MappabilityExclusion.bed.gz to the Mappability subdirectory inside reference_path. This BED file is automatically used by BuildReference() if MappabilityRef is not specified.

Functions

  • Mappability_GenReads: Generates synthetic reads from a genome FASTA file, for mappability calculations.

  • Mappability_CalculateExclusions: Generate a BED file defining low mappability regions, using reads generated by Mappability_GenReads(), aligned to the genome.

See Also

BuildReference

Examples


# (1a) Creates genome resource files

ref_path <- file.path(tempdir(), "Reference")

GetReferenceResource(
    reference_path = ref_path,
    fasta = chrZ_genome(),
    gtf = chrZ_gtf()
)

# (1b) Systematically generate reads based on the NxtIRF example genome:

Mappability_GenReads(
    reference_path = ref_path
)
## Not run: 

# (2) Align the generated reads using Rsubread:

# (2a) Build the Rsubread genome index:

setwd(ref_path)
Rsubread::buildindex(basename = "./reference_index",
    reference = chrZ_genome())

# (2b) Align the synthetic reads using Rsubread::subjunc()

Rsubread::subjunc(
    index = "./reference_index",
    readfile1 = file.path(ref_path, "Mappability", "Reads.fa"),
    output_file = file.path(ref_path, "Mappability", "AlignedReads.bam"),
    useAnnotation = TRUE,
    annot.ext = chrZ_gtf(),
    isGTF = TRUE
)

# (3) Analyse the aligned reads in the BAM file for low-mappability regions:

Mappability_CalculateExclusions(
    reference_path = ref_path,
    aligned_bam = file.path(ref_path, "Mappability", "AlignedReads.bam")
)

# (4) Build the NxtIRF reference using the calculated Mappability Exclusions

BuildReference(ref_path)

# NB the default is to search for the BED file generated by
# `Mappability_CalculateExclusions()` in the given reference_path

## End(Not run)

alexchwong/NxtIRFcore documentation built on Oct. 31, 2022, 9:14 a.m.