STAR-methods: STAR wrapper for building reference for STAR, and aligning...
In alexchwong/NxtIRFcore: Core Engine for NxtIRF: a User-Friendly Intron Retention and Alternative Splicing Analysis using the IRFinder Engine

STAR-methods

R Documentation

STAR wrapper for building reference for STAR, and aligning RNA-sequencing

Description

These functions run the STAR aligner to build a STAR genome reference, calculate mappability exclusion regions using STAR, and align one or more FASTQ files (single or paired) to the generated genome. These functions only work on Linux-based systems with STAR installed. STAR must be accessible via $PATH. See details and examples

Usage

STAR_version()

STAR_buildRef(
  reference_path,
  STAR_ref_path = file.path(reference_path, "STAR"),
  also_generate_mappability = TRUE,
  map_depth_threshold = 4,
  sjdbOverhang = 149,
  n_threads = 4,
  additional_args = NULL,
  ...
)

STAR_Mappability(
  reference_path,
  STAR_ref_path = file.path(reference_path, "STAR"),
  map_depth_threshold = 4,
  n_threads = 4,
  ...
)

STAR_align_experiment(
  Experiment,
  STAR_ref_path,
  BAM_output_path,
  trim_adaptor = "AGATCGGAAG",
  two_pass = FALSE,
  n_threads = 4
)

STAR_align_fastq(
  fastq_1 = c("./sample_1.fastq"),
  fastq_2 = NULL,
  STAR_ref_path,
  BAM_output_path,
  two_pass = FALSE,
  trim_adaptor = "AGATCGGAAG",
  memory_mode = "NoSharedMemory",
  additional_args = NULL,
  n_threads = 4
)

Arguments

`reference_path`	The path to the reference. GetReferenceResource must first be run using this path as its `reference_path`
`STAR_ref_path`	(Default - the "STAR" subdirectory under `reference_path`) The directory containing the STAR reference to be used or to contain the newly-generated STAR reference
`also_generate_mappability`	Whether `STAR_buildRef()` also calculates Mappability Exclusion regions.
`map_depth_threshold`	(Default 4) The depth of mapped reads threshold at or below which Mappability exclusion regions are defined. See Mappability-methods. Ignored if `also_generate_mappability = FALSE`
`sjdbOverhang`	(Default = 149) A STAR setting indicating the length of the donor / acceptor sequence on each side of the junctions. Ideally equal to (mate_length - 1). As the most common read length is 150, the default of this function is 149. See the STAR aligner manual for details.
`n_threads`	The number of threads to run the STAR aligner.
`additional_args`	A character vector of additional arguments to be parsed into STAR. See examples below.
`...`	Additional arguments to be parsed into `Mappability_GenReads()`. See Mappability-methods.
`Experiment`	A two or three-column data frame with the columns denoting sample names, forward-FASTQ and reverse-FASTQ files. This can be conveniently generated using Find_FASTQ
`BAM_output_path`	The path under which STAR outputs the aligned BAM files. In `STAR_align_experiment()`, STAR will output aligned BAMS inside subdirectories of this folder, named by sample names. In `STAR_align_fastq()`, STAR will output directly into this path.
`trim_adaptor`	The sequence of the Illumina adaptor to trim via STAR's `--clip3pAdapterSeq` option
`two_pass`	Whether to use two-pass mapping. In `STAR_align_experiment()`, STAR will first align every sample and generate a list of splice junctions but not BAM files. The junctions are then given to STAR to generate a temporary genome (contained within `_STARgenome`) subdirectory within that of the first sample), using these junctions to improve novel junction detection. In `STAR_align_fastq()`, STAR will run `--twopassMode Basic`
`fastq_1, fastq_2`	In STAR_align_fastq: character vectors giving the path(s) of one or more FASTQ (or FASTA) files to be aligned. If single reads are to be aligned, omit `fastq_2`
`memory_mode`	The parameter to be parsed to `--genomeLoad`; either `NoSharedMemory` or `LoadAndKeep` are used.

Details

Pre-requisites

STAR_buildRef requires GetReferenceResource to be run to fetch the required genome and gene annotation files.

STAR_Mappability, STAR_align_experiment and STAR_align_fastq requires a STAR genome, which can be built using STAR_buildRef

Function Description

For STAR_buildRef: this function will create a STAR genome reference in the STAR subdirectory in the path given by reference_path. Optionally, it will run STAR_Mappability if also_generate_mappability is set to TRUE

For STAR_Mappability: this function will first will run Mappability_GenReads, then use the given STAR genome to align the synthetic reads using STAR. The aligned BAM file will then be processed using Mappability_CalculateExclusions to calculate the lowly-mappable genomic regions, producing the MappabilityExclusion.bed.gz output file.

For STAR_align_fastq: aligns a single or pair of FASTQ files to the given STAR genome using the STAR aligner.

For STAR_align_experiment: aligns a set of FASTQ or paired FASTQ files using the given STAR genome using the STAR aligner. A data.frame specifying sample names and corresponding FASTQ files are required

Value

None. STAR will output files into the given output directories.

Functions

STAR_version: Checks whether STAR is installed, and its version
STAR_buildRef: Creates a STAR genome reference.
STAR_Mappability: Calculates lowly-mappable genomic regions using STAR
STAR_align_experiment: Aligns multiple sets of FASTQ files, belonging to multiple samples
STAR_align_fastq: Aligns a single sample (with single or paired FASTQ or FASTA files)

Examples

# 0) Check that STAR is installed and compatible with NxtIRF

STAR_version()
## Not run: 

# The below workflow illustrates
# 1) Getting the reference resource
# 2) Building the STAR Reference, including Mappability Exclusion calculation
# 3) Building the NxtIRF Reference, using the Mappability Exclusion file
# 4) Aligning (a) one or (b) multiple raw sequencing samples.


# 1) Reference generation from Ensembl's FTP links

FTP <- "ftp://ftp.ensembl.org/pub/release-94/"

GetReferenceResource(
    reference_path = "Reference_FTP",
    fasta = paste0(FTP, "fasta/homo_sapiens/dna/",
        "Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"),
    gtf = paste0(FTP, "gtf/homo_sapiens/",
        "Homo_sapiens.GRCh38.94.chr.gtf.gz")
)

# 2) Generates STAR genome within the NxtIRF reference. Also generates
# mappability exclusion gzipped BED file inside the "Mappability/" sub-folder

STAR_buildRef(
    reference_path = "Reference_FTP",
    n_threads = 8,
    also_generate_mappability = TRUE
)

# 2 alt) Generates STAR genome of the example NxtIRF genome.
#     This demonstrates using custom STAR parameters, as the example NxtIRF
#     genome is ~100k in length, so --genomeSAindexNbases needs to be
#     adjusted to be min(14, log2(GenomeLength)/2 - 1)

GetReferenceResource(
    reference_path = "Reference_chrZ",
    fasta = chrZ_genome(),
    gtf = chrZ_gtf()
)

STAR_buildRef(
    reference_path = "Reference_chrZ",
    n_threads = 8,
    additional_args = c("--genomeSAindexNbases", "7"),
    also_generate_mappability = TRUE
)

# 3) Build NxtIRF reference using the newly-generated Mappability exclusions

#' NB: also specifies to use the hg38 nonPolyA resource

BuildReference(reference_path = "Reference_FTP", genome_type = "hg38")

# 4a) Align a single sample using the STAR reference

STAR_align_fastq(
    STAR_ref_path = file.path("Reference_FTP", "STAR"),
    BAM_output_path = "./bams/sample1",
    fastq_1 = "sample1_1.fastq", fastq_2 = "sample1_2.fastq",
    n_threads = 8
)

# 4b) Align multiple samples, using two-pass alignment

Experiment <- data.frame(
    sample = c("sample_A", "sample_B"),
    forward = file.path("raw_data", c("sample_A", "sample_B"),
        c("sample_A_1.fastq", "sample_B_1.fastq")),
    reverse = file.path("raw_data", c("sample_A", "sample_B"),
        c("sample_A_2.fastq", "sample_B_2.fastq"))
)

STAR_align_experiment(
    Experiment = Experiment,
    STAR_ref_path = file.path("Reference_FTP", "STAR"),
    BAM_output_path = "./bams",
    two_pass = TRUE,
    n_threads = 8
)

## End(Not run)

alexchwong/NxtIRFcore documentation built on Oct. 31, 2022, 9:14 a.m.