STAR-methods: STAR wrapper for building reference for STAR, and aligning...

STAR-methodsR Documentation

STAR wrapper for building reference for STAR, and aligning RNA-sequencing

Description

These functions run the STAR aligner to build a STAR genome reference, calculate mappability exclusion regions using STAR, and align one or more FASTQ files (single or paired) to the generated genome. These functions only work on Linux-based systems with STAR installed. STAR must be accessible via $PATH. See details and examples

Usage

STAR_version()

STAR_buildRef(
  reference_path,
  STAR_ref_path = file.path(reference_path, "STAR"),
  also_generate_mappability = TRUE,
  map_depth_threshold = 4,
  sjdbOverhang = 149,
  n_threads = 4,
  additional_args = NULL,
  ...
)

STAR_Mappability(
  reference_path,
  STAR_ref_path = file.path(reference_path, "STAR"),
  map_depth_threshold = 4,
  n_threads = 4,
  ...
)

STAR_align_experiment(
  Experiment,
  STAR_ref_path,
  BAM_output_path,
  trim_adaptor = "AGATCGGAAG",
  two_pass = FALSE,
  n_threads = 4
)

STAR_align_fastq(
  fastq_1 = c("./sample_1.fastq"),
  fastq_2 = NULL,
  STAR_ref_path,
  BAM_output_path,
  two_pass = FALSE,
  trim_adaptor = "AGATCGGAAG",
  memory_mode = "NoSharedMemory",
  additional_args = NULL,
  n_threads = 4
)

Arguments

reference_path

The path to the reference. GetReferenceResource must first be run using this path as its reference_path

STAR_ref_path

(Default - the "STAR" subdirectory under reference_path) The directory containing the STAR reference to be used or to contain the newly-generated STAR reference

also_generate_mappability

Whether STAR_buildRef() also calculates Mappability Exclusion regions.

map_depth_threshold

(Default 4) The depth of mapped reads threshold at or below which Mappability exclusion regions are defined. See Mappability-methods. Ignored if also_generate_mappability = FALSE

sjdbOverhang

(Default = 149) A STAR setting indicating the length of the donor / acceptor sequence on each side of the junctions. Ideally equal to (mate_length - 1). As the most common read length is 150, the default of this function is 149. See the STAR aligner manual for details.

n_threads

The number of threads to run the STAR aligner.

additional_args

A character vector of additional arguments to be parsed into STAR. See examples below.

...

Additional arguments to be parsed into Mappability_GenReads(). See Mappability-methods.

Experiment

A two or three-column data frame with the columns denoting sample names, forward-FASTQ and reverse-FASTQ files. This can be conveniently generated using Find_FASTQ

BAM_output_path

The path under which STAR outputs the aligned BAM files. In STAR_align_experiment(), STAR will output aligned BAMS inside subdirectories of this folder, named by sample names. In STAR_align_fastq(), STAR will output directly into this path.

trim_adaptor

The sequence of the Illumina adaptor to trim via STAR's --clip3pAdapterSeq option

two_pass

Whether to use two-pass mapping. In STAR_align_experiment(), STAR will first align every sample and generate a list of splice junctions but not BAM files. The junctions are then given to STAR to generate a temporary genome (contained within _STARgenome) subdirectory within that of the first sample), using these junctions to improve novel junction detection. In STAR_align_fastq(), STAR will run --twopassMode Basic

fastq_1, fastq_2

In STAR_align_fastq: character vectors giving the path(s) of one or more FASTQ (or FASTA) files to be aligned. If single reads are to be aligned, omit fastq_2

memory_mode

The parameter to be parsed to --genomeLoad; either NoSharedMemory or LoadAndKeep are used.

Details

Pre-requisites

STAR_buildRef requires GetReferenceResource to be run to fetch the required genome and gene annotation files.

STAR_Mappability, STAR_align_experiment and STAR_align_fastq requires a STAR genome, which can be built using STAR_buildRef

Function Description

For STAR_buildRef: this function will create a STAR genome reference in the STAR subdirectory in the path given by reference_path. Optionally, it will run STAR_Mappability if also_generate_mappability is set to TRUE

For STAR_Mappability: this function will first will run Mappability_GenReads, then use the given STAR genome to align the synthetic reads using STAR. The aligned BAM file will then be processed using Mappability_CalculateExclusions to calculate the lowly-mappable genomic regions, producing the MappabilityExclusion.bed.gz output file.

For STAR_align_fastq: aligns a single or pair of FASTQ files to the given STAR genome using the STAR aligner.

For STAR_align_experiment: aligns a set of FASTQ or paired FASTQ files using the given STAR genome using the STAR aligner. A data.frame specifying sample names and corresponding FASTQ files are required

Value

None. STAR will output files into the given output directories.

Functions

  • STAR_version: Checks whether STAR is installed, and its version

  • STAR_buildRef: Creates a STAR genome reference.

  • STAR_Mappability: Calculates lowly-mappable genomic regions using STAR

  • STAR_align_experiment: Aligns multiple sets of FASTQ files, belonging to multiple samples

  • STAR_align_fastq: Aligns a single sample (with single or paired FASTQ or FASTA files)

See Also

BuildReference Find_Samples Mappability-methods

The latest STAR documentation

Examples

# 0) Check that STAR is installed and compatible with NxtIRF

STAR_version()
## Not run: 

# The below workflow illustrates
# 1) Getting the reference resource
# 2) Building the STAR Reference, including Mappability Exclusion calculation
# 3) Building the NxtIRF Reference, using the Mappability Exclusion file
# 4) Aligning (a) one or (b) multiple raw sequencing samples.


# 1) Reference generation from Ensembl's FTP links

FTP <- "ftp://ftp.ensembl.org/pub/release-94/"

GetReferenceResource(
    reference_path = "Reference_FTP",
    fasta = paste0(FTP, "fasta/homo_sapiens/dna/",
        "Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"),
    gtf = paste0(FTP, "gtf/homo_sapiens/",
        "Homo_sapiens.GRCh38.94.chr.gtf.gz")
)

# 2) Generates STAR genome within the NxtIRF reference. Also generates
# mappability exclusion gzipped BED file inside the "Mappability/" sub-folder

STAR_buildRef(
    reference_path = "Reference_FTP",
    n_threads = 8,
    also_generate_mappability = TRUE
)

# 2 alt) Generates STAR genome of the example NxtIRF genome.
#     This demonstrates using custom STAR parameters, as the example NxtIRF
#     genome is ~100k in length, so --genomeSAindexNbases needs to be
#     adjusted to be min(14, log2(GenomeLength)/2 - 1)

GetReferenceResource(
    reference_path = "Reference_chrZ",
    fasta = chrZ_genome(),
    gtf = chrZ_gtf()
)

STAR_buildRef(
    reference_path = "Reference_chrZ",
    n_threads = 8,
    additional_args = c("--genomeSAindexNbases", "7"),
    also_generate_mappability = TRUE
)

# 3) Build NxtIRF reference using the newly-generated Mappability exclusions

#' NB: also specifies to use the hg38 nonPolyA resource

BuildReference(reference_path = "Reference_FTP", genome_type = "hg38")

# 4a) Align a single sample using the STAR reference

STAR_align_fastq(
    STAR_ref_path = file.path("Reference_FTP", "STAR"),
    BAM_output_path = "./bams/sample1",
    fastq_1 = "sample1_1.fastq", fastq_2 = "sample1_2.fastq",
    n_threads = 8
)

# 4b) Align multiple samples, using two-pass alignment

Experiment <- data.frame(
    sample = c("sample_A", "sample_B"),
    forward = file.path("raw_data", c("sample_A", "sample_B"),
        c("sample_A_1.fastq", "sample_B_1.fastq")),
    reverse = file.path("raw_data", c("sample_A", "sample_B"),
        c("sample_A_2.fastq", "sample_B_2.fastq"))
)

STAR_align_experiment(
    Experiment = Experiment,
    STAR_ref_path = file.path("Reference_FTP", "STAR"),
    BAM_output_path = "./bams",
    two_pass = TRUE,
    n_threads = 8
)

## End(Not run)

alexchwong/NxtIRFcore documentation built on Oct. 31, 2022, 9:14 a.m.