| STAR-methods | R Documentation |
These STAR helper / wrapper functions allow users to (1) create a STAR genome reference (with or without GTF), (2) align one or more RNA-seq samples, and (3) calculate regions of low mappability. STAR references can be created using one-step (genome and GTF), or two-step (genome first, then on-the-fly with injected GTF) approaches.
STAR_version()
STAR_buildRef(
reference_path,
STAR_ref_path = file.path(reference_path, "STAR"),
n_threads = 4,
overwrite = FALSE,
sjdbOverhang = 100,
sparsity = 1,
also_generate_mappability = FALSE,
map_depth_threshold = 4,
additional_args = NULL,
...
)
STAR_alignExperiment(
Experiment,
STAR_ref_path,
BAM_output_path,
n_threads = 4,
overwrite = FALSE,
two_pass = FALSE,
trim_adaptor = "AGATCGGAAG",
additional_args = NULL
)
STAR_alignReads(
fastq_1 = c("./sample_1.fastq"),
fastq_2 = NULL,
STAR_ref_path,
BAM_output_path,
n_threads = 4,
overwrite = FALSE,
two_pass = FALSE,
trim_adaptor = "AGATCGGAAG",
memory_mode = "NoSharedMemory",
additional_args = NULL
)
STAR_buildGenome(
reference_path,
STAR_ref_path = file.path(reference_path, "STAR"),
n_threads = 4,
overwrite = FALSE,
sparsity = 1,
also_generate_mappability = FALSE,
map_depth_threshold = 4,
additional_args = NULL,
...
)
STAR_loadGenomeGTF(
reference_path,
STAR_ref_path,
STARgenome_output = file.path(tempdir(), "STAR"),
n_threads = 4,
overwrite = FALSE,
sjdbOverhang = 100,
extraFASTA = "",
additional_args = NULL
)
STAR_mappability(
reference_path,
STAR_ref_path = file.path(reference_path, "STAR"),
map_depth_threshold = 4,
n_threads = 4,
...
)
reference_path |
The path to the reference.
getResources must first be run using this path
as its |
STAR_ref_path |
(Default - the "STAR" subdirectory under
|
n_threads |
The number of threads to run the STAR aligner. |
overwrite |
(default |
sjdbOverhang |
(Default = 100) A STAR setting indicating the length of the donor / acceptor sequence on each side of the junctions. Ideally equal to (mate_length - 1). See the STAR aligner manual for details. |
sparsity |
(default |
also_generate_mappability |
Whether |
map_depth_threshold |
(Default |
additional_args |
A character vector of additional arguments to be parsed into STAR. See examples below. |
... |
Additional arguments to be parsed into
|
Experiment |
A two or three-column data frame with the columns denoting sample names, forward-FASTQ and reverse-FASTQ files. This can be conveniently generated using findFASTQ |
BAM_output_path |
The path under which STAR outputs the aligned BAM
files. In |
two_pass |
Whether to use two-pass mapping. In
|
trim_adaptor |
The sequence of the Illumina adaptor to trim via STAR's
|
fastq_1, fastq_2 |
In STAR_alignReads: character vectors giving the
path(s) of one or more FASTQ (or FASTA) files to be aligned.
If single reads are to be aligned, omit |
memory_mode |
The parameter to be parsed to |
STARgenome_output |
The output path of the created on-the-fly genome |
extraFASTA |
(default |
Pre-requisites
STAR_buildRef() and STAR_buildGenome() require prepared genome
and gene annotation reference retrieved using getResources, which is run
internally by buildRef
STAR_loadGenomeGTF() requires the above, and additionally a STAR genome
created using STAR_buildGenome()
STAR_alignExperiment(), STAR_alignReads(), and STAR_mappability():
requires a STAR genome, which can be built using STAR_buildRef() or
STAR_buildGenome() followed by STAR_loadGenomeGTF()
Function Description
For STAR_buildRef: this function will create a STAR genome reference
using the same genome FASTA and gene annotation GTF used to create
the SpliceWiz reference. Optionally, it will run STAR_mappability
if also_generate_mappability is set to TRUE
For STAR_alignExperiment: aligns a set of FASTQ or paired FASTQ files
using the given
STAR genome using the STAR aligner.
A data.frame specifying sample names and corresponding FASTQ files are
required
For STAR_alignReads: aligns a single or pair of FASTQ files to the given
STAR genome using the STAR aligner.
For STAR_buildGenome: Creates a STAR genome reference, using ONLY the
FASTA file used to create the SpliceWiz reference. This allows users to
create a single STAR reference for use with multiple transcriptome (GTF)
references (on different occasions). Optionally, it will run
STAR_mappability if also_generate_mappability is set to TRUE
For STAR_loadGenomeGTF: Creates an "on-the-fly" STAR genome, injecting GTF
from the given SpliceWiz reference_path, setting sjdbOverhang setting,
and (optionally) any spike-ins via the extraFASTA parameter.
This allows users to create a single STAR reference for use with multiple
transcriptome (GTF) references, with different sjdbOverhang settings,
and/or spike-ins (on different occasions or for different projects).
For STAR_mappability: this function will first
will run generateSyntheticReads, then use the given STAR genome to
align the synthetic reads using STAR. The aligned BAM file will then be
processed using calculateMappability to calculate the
lowly-mappable genomic regions,
producing the MappabilityExclusion.bed.gz output file.
For STAR_version(): The STAR version
For STAR_buildRef(): None
For STAR_alignExperiment(): None
For STAR_alignReads(): None
For STAR_buildGenome(): None
For STAR_loadGenomeGTF():
The path of the on-the-fly STAR genome, typically in the subdirectory
"_STARgenome" within the given STARgenome_output directory
For STAR_mappability(): None
STAR_version(): Checks whether STAR is installed, and its version
STAR_buildRef(): Creates a STAR genome reference, using both FASTA
and GTF files used to create the SpliceWiz reference
STAR_alignExperiment(): Aligns multiple sets of FASTQ files, belonging to
multiple samples
STAR_alignReads(): Aligns a single sample (with single or paired FASTQ
or FASTA files)
STAR_buildGenome(): Creates a STAR genome reference, using ONLY the
FASTA file used to create the SpliceWiz reference
STAR_loadGenomeGTF(): Creates an "on-the-fly" STAR genome, injecting GTF
from the given SpliceWiz reference_path, setting sjdbOverhang setting,
and (optionally) any spike-ins as extraFASTA
STAR_mappability(): Calculates lowly-mappable genomic regions using STAR
Build-Reference-methods findSamples Mappability-methods
The latest STAR documentation
# 0) Check that STAR is installed and compatible with SpliceWiz
STAR_version()
## Not run:
# The below workflow illustrates
# 1) Getting the reference resource
# 2) Building the STAR Reference, including Mappability Exclusion calculation
# 3) Building the SpliceWiz Reference, using the Mappability Exclusion file
# 4) Aligning (a) one or (b) multiple raw sequencing samples.
# 1) Reference generation from Ensembl's FTP links
FTP <- "ftp://ftp.ensembl.org/pub/release-94/"
getResources(
reference_path = "Reference_FTP",
fasta = paste0(FTP, "fasta/homo_sapiens/dna/",
"Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"),
gtf = paste0(FTP, "gtf/homo_sapiens/",
"Homo_sapiens.GRCh38.94.chr.gtf.gz")
)
# 2) Generates STAR genome within the SpliceWiz reference. Also generates
# mappability exclusion gzipped BED file inside the "Mappability/" sub-folder
STAR_buildRef(
reference_path = "Reference_FTP",
STAR_ref_path = file.path("Reference_FTP", "STAR"),
n_threads = 8,
also_generate_mappability = TRUE
)
# 2a) Generates STAR genome of the example SpliceWiz genome.
# This demonstrates using custom STAR parameters, as the example
# SpliceWiz genome is ~100k in length,
# so --genomeSAindexNbases needs to be
# adjusted to be min(14, log2(GenomeLength)/2 - 1)
getResources(
reference_path = "Reference_chrZ",
fasta = chrZ_genome(),
gtf = chrZ_gtf()
)
STAR_buildRef(
reference_path = "Reference_chrZ",
STAR_ref_path = file.path("Reference_chrZ", "STAR"),
n_threads = 8,
additional_args = c("--genomeSAindexNbases", "7"),
also_generate_mappability = TRUE
)
# 3) Build SpliceWiz reference using the newly-generated
# Mappability exclusions
#' NB: also specifies to use the hg38 nonPolyA resource
buildRef(reference_path = "Reference_FTP", genome_type = "hg38")
# 4a) Align a single sample using the STAR reference
STAR_alignReads(
fastq_1 = "sample1_1.fastq", fastq_2 = "sample1_2.fastq",
STAR_ref_path = file.path("Reference_FTP", "STAR"),
BAM_output_path = "./bams/sample1",
n_threads = 8
)
# 4b) Align multiple samples, using two-pass alignment
Experiment <- data.frame(
sample = c("sample_A", "sample_B"),
forward = file.path("raw_data", c("sample_A", "sample_B"),
c("sample_A_1.fastq", "sample_B_1.fastq")),
reverse = file.path("raw_data", c("sample_A", "sample_B"),
c("sample_A_2.fastq", "sample_B_2.fastq"))
)
STAR_alignExperiment(
Experiment = Experiment,
STAR_ref_path = file.path("Reference_FTP", "STAR"),
BAM_output_path = "./bams",
n_threads = 8,
two_pass = TRUE
)
# - Building a STAR genome (only) reference, and injecting GTF as a
# subsequent step
#
# This is useful for users who want to create a single STAR genome, for
# experimentation with different GTF files.
# It is important to note that the chromosome names of the genome (FASTA)
# file and the GTF file needs to be identical. Thus, Ensembl and Gencode
# GTF files should not be mixed (unless the chromosome GTF names have
# been fixed)
# - also set sparsity = 2 to build human genome so that it will fit in
# 16 Gb RAM. NB: this step's RAM usage can be set using the
# `--limitGenomeGenerateRAM` parameter
STAR_buildGenome(
reference_path = "Reference_FTP",
STAR_ref_path = file.path("Reference_FTP", "STAR_genomeOnly"),
n_threads = 8, sparsity = 2,
additional_args = c("--limitGenomeGenerateRAM", "16000000000")
)
# - Injecting a GTF into a genome-only STAR reference
#
# This creates an on-the-fly STAR genome, using a GTF file
# (derived from a SpliceWiz reference) into a new location.
# This allows a single STAR reference to use multiple GTFs
# on different occasions.
STAR_new_ref <- STAR_loadGenomeGTF(
reference_path = "Reference_FTP",
STAR_ref_path = file.path("Reference_FTP", "STAR_genomeOnly"),
STARgenome_output = file.path(tempdir(), "STAR"),
n_threads = 4,
sjdbOverhang = 100
)
# This new reference can then be used to align your experiment:
STAR_alignExperiment(
Experiment = Experiment,
STAR_ref_path = STAR_new_ref,
BAM_output_path = "./bams",
n_threads = 8,
two_pass = TRUE
)
# Typically, one should `clean up` the on-the-fly STAR reference (as it is
# large!). If it is in a temporary directory, it will be cleaned up
# when the current R session ends; otherwise this needs to be done manually:
unlink(file.path(tempdir(), "STAR"), recursive = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.