Mappability-methods | R Documentation |
These functions empirically calculate low-mappability (Mappability Exclusion) regions using the given reference. A splice-aware alignment software capable of aligning reads to the genome is required. See details and examples below.
generateSyntheticReads(
reference_path,
read_len = 70,
read_stride = 10,
error_pos = 35,
verbose = TRUE,
alt_fasta_file
)
calculateMappability(
reference_path,
aligned_bam = file.path(reference_path, "Mappability", "Aligned.out.bam"),
threshold = 4,
n_threads = 1
)
reference_path |
The directory of the reference prepared by getResources |
read_len |
The nucleotide length of the synthetic reads |
read_stride |
The nucleotide distance between adjacent synthetic reads |
error_pos |
The position of the procedurally-generated nucleotide error from the start of each synthetic reads |
verbose |
Whether additional status messages are shown |
alt_fasta_file |
(Optional) The path to the user-supplied genome fasta
file, if different to that found inside the |
aligned_bam |
The BAM file of alignment of the synthetic reads generated
by |
threshold |
Genomic regions with this alignment read depth (or below) in the aligned synthetic read BAM are defined as low mappability regions. |
n_threads |
The number of threads used to calculate mappability exclusion regions from aligned bam file of synthetic reads. |
Creating a Mappability Exclusion BED file is a three-step process.
First, using generateSyntheticReads()
, synthetic reads are systematically
generated using the given genome contained within reference_path
, prepared
via getResources. Alternatively, use alt_fasta_file
to set the genome
sequence if this is different to that prepared by getResources
or if
getResources
is not yet run.
Second, an aligner such as STAR (preferably the same aligner used for the subsequent RNA-seq experiment) is required to align these reads to the source genome. Poorly mapped regions of the genome will be reflected by regions of low coverage depth.
Finally, the BAM file containing the aligned reads is analysed
using calculateMappability()
, to identify
low-mappability regions to compile the Mappability Exclusion BED file.
It is recommended to leave all parameters to their default settings. Regular
users should only specify reference_path
, aligned_bam
and n_threads
,
as required.
NB: STAR_mappability runs all 3 steps required, using the STAR
aligner.
This only works in systems where STAR
is installed.
NB2: buildFullRef builds the STAR reference, then calculates mappability. It then uses the calculated mappability regions to build the SpliceWiz reference.
NB3: In systems where STAR
is not available, consider using HISAT2 or
Rsubread. A working example using Rsubread is shown below.
For generateSyntheticReads
: writes Reads.fa
to the Mappability
subdirectory inside the given reference_path
.
For calculateMappability
: writes a gzipped BED file
named MappabilityExclusion.bed.gz
to the Mappability
subdirectory
inside reference_path
.
This BED file is automatically used by buildRef if its
MappabilityRef
parameter is not specified.
generateSyntheticReads()
: Generates synthetic reads from a
genome FASTA file, for mappability calculations.
calculateMappability()
: Generate a BED file defining
low mappability regions, using reads generated by
generateSyntheticReads()
, aligned to the genome.
Build-Reference-methods
# (1a) Creates genome resource files
ref_path <- file.path(tempdir(), "refWithMapExcl")
getResources(
reference_path = ref_path,
fasta = chrZ_genome(),
gtf = chrZ_gtf()
)
# (1b) Systematically generate reads based on the example genome:
generateSyntheticReads(
reference_path = ref_path
)
## Not run:
# (2) Align the generated reads using Rsubread:
# (2a) Build the Rsubread genome index:
subreadIndexPath <- file.path(ref_path, "Rsubread")
if(!dir.exists(subreadIndexPath)) dir.create(subreadIndexPath)
Rsubread::buildindex(
basename = file.path(subreadIndexPath, "reference_index"),
reference = chrZ_genome()
)
# (2b) Align the synthetic reads using Rsubread::subjunc()
Rsubread::subjunc(
index = file.path(subreadIndexPath, "reference_index"),
readfile1 = file.path(ref_path, "Mappability", "Reads.fa"),
output_file = file.path(ref_path, "Mappability", "AlignedReads.bam"),
useAnnotation = TRUE,
annot.ext = chrZ_gtf(),
isGTF = TRUE
)
# (3) Analyse the aligned reads in the BAM file for low-mappability regions:
calculateMappability(
reference_path = ref_path,
aligned_bam = file.path(ref_path, "Mappability", "AlignedReads.bam")
)
# (4) Build the example reference using the calculated Mappability Exclusions
buildRef(ref_path)
# NB the default is to search for the BED file generated by
# `calculateMappability()` in the given reference_path
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.