STAR-methods | R Documentation |
These STAR helper / wrapper functions allow users to (1) create a STAR genome reference (with or without GTF), (2) align one or more RNA-seq samples, and (3) calculate regions of low mappability. STAR references can be created using one-step (genome and GTF), or two-step (genome first, then on-the-fly with injected GTF) approaches.
STAR_version()
STAR_buildRef(
reference_path,
STAR_ref_path = file.path(reference_path, "STAR"),
n_threads = 4,
overwrite = FALSE,
sjdbOverhang = 100,
sparsity = 1,
also_generate_mappability = FALSE,
map_depth_threshold = 4,
additional_args = NULL,
...
)
STAR_alignExperiment(
Experiment,
STAR_ref_path,
BAM_output_path,
n_threads = 4,
overwrite = FALSE,
two_pass = FALSE,
trim_adaptor = "AGATCGGAAG",
additional_args = NULL
)
STAR_alignReads(
fastq_1 = c("./sample_1.fastq"),
fastq_2 = NULL,
STAR_ref_path,
BAM_output_path,
n_threads = 4,
overwrite = FALSE,
two_pass = FALSE,
trim_adaptor = "AGATCGGAAG",
memory_mode = "NoSharedMemory",
additional_args = NULL
)
STAR_buildGenome(
reference_path,
STAR_ref_path = file.path(reference_path, "STAR"),
n_threads = 4,
overwrite = FALSE,
sparsity = 1,
also_generate_mappability = FALSE,
map_depth_threshold = 4,
additional_args = NULL,
...
)
STAR_loadGenomeGTF(
reference_path,
STAR_ref_path,
STARgenome_output = file.path(tempdir(), "STAR"),
n_threads = 4,
overwrite = FALSE,
sjdbOverhang = 100,
extraFASTA = "",
additional_args = NULL
)
STAR_mappability(
reference_path,
STAR_ref_path = file.path(reference_path, "STAR"),
map_depth_threshold = 4,
n_threads = 4,
...
)
reference_path |
The path to the reference.
getResources must first be run using this path
as its |
STAR_ref_path |
(Default - the "STAR" subdirectory under
|
n_threads |
The number of threads to run the STAR aligner. |
overwrite |
(default |
sjdbOverhang |
(Default = 100) A STAR setting indicating the length of the donor / acceptor sequence on each side of the junctions. Ideally equal to (mate_length - 1). See the STAR aligner manual for details. |
sparsity |
(default |
also_generate_mappability |
Whether |
map_depth_threshold |
(Default |
additional_args |
A character vector of additional arguments to be parsed into STAR. See examples below. |
... |
Additional arguments to be parsed into
|
Experiment |
A two or three-column data frame with the columns denoting sample names, forward-FASTQ and reverse-FASTQ files. This can be conveniently generated using findFASTQ |
BAM_output_path |
The path under which STAR outputs the aligned BAM
files. In |
two_pass |
Whether to use two-pass mapping. In
|
trim_adaptor |
The sequence of the Illumina adaptor to trim via STAR's
|
fastq_1 , fastq_2 |
In STAR_alignReads: character vectors giving the
path(s) of one or more FASTQ (or FASTA) files to be aligned.
If single reads are to be aligned, omit |
memory_mode |
The parameter to be parsed to |
STARgenome_output |
The output path of the created on-the-fly genome |
extraFASTA |
(default |
Pre-requisites
STAR_buildRef()
and STAR_buildGenome()
require prepared genome
and gene annotation reference retrieved using getResources, which is run
internally by buildRef
STAR_loadGenomeGTF()
requires the above, and additionally a STAR genome
created using STAR_buildGenome()
STAR_alignExperiment()
, STAR_alignReads()
, and STAR_mappability()
:
requires a STAR
genome, which can be built using STAR_buildRef()
or
STAR_buildGenome()
followed by STAR_loadGenomeGTF()
Function Description
For STAR_buildRef
: this function will create a STAR
genome reference
using the same genome FASTA and gene annotation GTF used to create
the SpliceWiz reference. Optionally, it will run STAR_mappability
if also_generate_mappability
is set to TRUE
For STAR_alignExperiment
: aligns a set of FASTQ or paired FASTQ files
using the given
STAR
genome using the STAR
aligner.
A data.frame specifying sample names and corresponding FASTQ files are
required
For STAR_alignReads
: aligns a single or pair of FASTQ files to the given
STAR
genome using the STAR
aligner.
For STAR_buildGenome
: Creates a STAR genome reference, using ONLY the
FASTA file used to create the SpliceWiz reference. This allows users to
create a single STAR reference for use with multiple transcriptome (GTF)
references (on different occasions). Optionally, it will run
STAR_mappability
if also_generate_mappability
is set to TRUE
For STAR_loadGenomeGTF
: Creates an "on-the-fly" STAR genome, injecting GTF
from the given SpliceWiz reference_path
, setting sjdbOverhang
setting,
and (optionally) any spike-ins via the extraFASTA
parameter.
This allows users to create a single STAR reference for use with multiple
transcriptome (GTF) references, with different sjdbOverhang settings,
and/or spike-ins (on different occasions or for different projects).
For STAR_mappability
: this function will first
will run generateSyntheticReads, then use the given STAR
genome to
align the synthetic reads using STAR
. The aligned BAM file will then be
processed using calculateMappability to calculate the
lowly-mappable genomic regions,
producing the MappabilityExclusion.bed.gz
output file.
For STAR_version()
: The STAR version
For STAR_buildRef()
: None
For STAR_alignExperiment()
: None
For STAR_alignReads()
: None
For STAR_buildGenome()
: None
For STAR_loadGenomeGTF()
:
The path of the on-the-fly STAR genome, typically in the subdirectory
"_STARgenome" within the given STARgenome_output
directory
For STAR_mappability()
: None
STAR_version()
: Checks whether STAR is installed, and its version
STAR_buildRef()
: Creates a STAR genome reference, using both FASTA
and GTF files used to create the SpliceWiz reference
STAR_alignExperiment()
: Aligns multiple sets of FASTQ files, belonging to
multiple samples
STAR_alignReads()
: Aligns a single sample (with single or paired FASTQ
or FASTA files)
STAR_buildGenome()
: Creates a STAR genome reference, using ONLY the
FASTA file used to create the SpliceWiz reference
STAR_loadGenomeGTF()
: Creates an "on-the-fly" STAR genome, injecting GTF
from the given SpliceWiz reference_path
, setting sjdbOverhang
setting,
and (optionally) any spike-ins as extraFASTA
STAR_mappability()
: Calculates lowly-mappable genomic regions using STAR
Build-Reference-methods findSamples Mappability-methods
The latest STAR documentation
# 0) Check that STAR is installed and compatible with SpliceWiz
STAR_version()
## Not run:
# The below workflow illustrates
# 1) Getting the reference resource
# 2) Building the STAR Reference, including Mappability Exclusion calculation
# 3) Building the SpliceWiz Reference, using the Mappability Exclusion file
# 4) Aligning (a) one or (b) multiple raw sequencing samples.
# 1) Reference generation from Ensembl's FTP links
FTP <- "ftp://ftp.ensembl.org/pub/release-94/"
getResources(
reference_path = "Reference_FTP",
fasta = paste0(FTP, "fasta/homo_sapiens/dna/",
"Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"),
gtf = paste0(FTP, "gtf/homo_sapiens/",
"Homo_sapiens.GRCh38.94.chr.gtf.gz")
)
# 2) Generates STAR genome within the SpliceWiz reference. Also generates
# mappability exclusion gzipped BED file inside the "Mappability/" sub-folder
STAR_buildRef(
reference_path = "Reference_FTP",
STAR_ref_path = file.path("Reference_FTP", "STAR"),
n_threads = 8,
also_generate_mappability = TRUE
)
# 2a) Generates STAR genome of the example SpliceWiz genome.
# This demonstrates using custom STAR parameters, as the example
# SpliceWiz genome is ~100k in length,
# so --genomeSAindexNbases needs to be
# adjusted to be min(14, log2(GenomeLength)/2 - 1)
getResources(
reference_path = "Reference_chrZ",
fasta = chrZ_genome(),
gtf = chrZ_gtf()
)
STAR_buildRef(
reference_path = "Reference_chrZ",
STAR_ref_path = file.path("Reference_chrZ", "STAR"),
n_threads = 8,
additional_args = c("--genomeSAindexNbases", "7"),
also_generate_mappability = TRUE
)
# 3) Build SpliceWiz reference using the newly-generated
# Mappability exclusions
#' NB: also specifies to use the hg38 nonPolyA resource
buildRef(reference_path = "Reference_FTP", genome_type = "hg38")
# 4a) Align a single sample using the STAR reference
STAR_alignReads(
fastq_1 = "sample1_1.fastq", fastq_2 = "sample1_2.fastq",
STAR_ref_path = file.path("Reference_FTP", "STAR"),
BAM_output_path = "./bams/sample1",
n_threads = 8
)
# 4b) Align multiple samples, using two-pass alignment
Experiment <- data.frame(
sample = c("sample_A", "sample_B"),
forward = file.path("raw_data", c("sample_A", "sample_B"),
c("sample_A_1.fastq", "sample_B_1.fastq")),
reverse = file.path("raw_data", c("sample_A", "sample_B"),
c("sample_A_2.fastq", "sample_B_2.fastq"))
)
STAR_alignExperiment(
Experiment = Experiment,
STAR_ref_path = file.path("Reference_FTP", "STAR"),
BAM_output_path = "./bams",
n_threads = 8,
two_pass = TRUE
)
# - Building a STAR genome (only) reference, and injecting GTF as a
# subsequent step
#
# This is useful for users who want to create a single STAR genome, for
# experimentation with different GTF files.
# It is important to note that the chromosome names of the genome (FASTA)
# file and the GTF file needs to be identical. Thus, Ensembl and Gencode
# GTF files should not be mixed (unless the chromosome GTF names have
# been fixed)
# - also set sparsity = 2 to build human genome so that it will fit in
# 16 Gb RAM. NB: this step's RAM usage can be set using the
# `--limitGenomeGenerateRAM` parameter
STAR_buildGenome(
reference_path = "Reference_FTP",
STAR_ref_path = file.path("Reference_FTP", "STAR_genomeOnly"),
n_threads = 8, sparsity = 2,
additional_args = c("--limitGenomeGenerateRAM", "16000000000")
)
# - Injecting a GTF into a genome-only STAR reference
#
# This creates an on-the-fly STAR genome, using a GTF file
# (derived from a SpliceWiz reference) into a new location.
# This allows a single STAR reference to use multiple GTFs
# on different occasions.
STAR_new_ref <- STAR_loadGenomeGTF(
reference_path = "Reference_FTP",
STAR_ref_path = file.path("Reference_FTP", "STAR_genomeOnly"),
STARgenome_output = file.path(tempdir(), "STAR"),
n_threads = 4,
sjdbOverhang = 100
)
# This new reference can then be used to align your experiment:
STAR_alignExperiment(
Experiment = Experiment,
STAR_ref_path = STAR_new_ref,
BAM_output_path = "./bams",
n_threads = 8,
two_pass = TRUE
)
# Typically, one should `clean up` the on-the-fly STAR reference (as it is
# large!). If it is in a temporary directory, it will be cleaned up
# when the current R session ends; otherwise this needs to be done manually:
unlink(file.path(tempdir(), "STAR"), recursive = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.