run_star: Run the STAR/STARsolo program

View source: R/run_star.R

run_starR Documentation

Run the STAR/STARsolo program

Description

Runs the STAR alignment program, can be run in parallel on multiple cores. STARsolo also implmented.

Usage

run_star(
  input1 = NULL,
  input2 = NULL,
  genome.dir = NULL,
  sample.name = NULL,
  out.dir = NULL,
  out.format = NULL,
  unmapped = NULL,
  sam.attributes = NULL,
  quant.mode = NULL,
  compressed = NULL,
  filter.type = NULL,
  filter.multi = NULL,
  filter.mismatch = NULL,
  filter.mismatch.pair = NULL,
  intron.min = NULL,
  intron.max = NULL,
  mate.gap = NULL,
  min.overhang.annotated = NULL,
  min.overhang.unannotated = NULL,
  solo.type = NULL,
  solo.cell.filtering = NULL,
  white.list = NULL,
  solo.cb.start = NULL,
  solo.cb.len = NULL,
  solo.umi.start = NULL,
  solo.umi.len = NULL,
  solo.barcode.read.length = NULL,
  solo.strand = NULL,
  solo.features = NULL,
  solo.multi.mappers = NULL,
  solo.umi.dedup = NULL,
  solo.umi.filter = NULL,
  solo.cb.wl.match = NULL,
  solo.out.filenames = NULL,
  threads = 10,
  parallel = FALSE,
  cores = 4,
  execute = TRUE,
  star = NULL,
  version = FALSE
)

Arguments

input1

List of the paths to files containing to the forward reads

input2

List of the paths to files containing to the reverse reads

genome.dir

Path to the directory where genome files are stored

sample.name

List of the sample names

out.dir

Name of the directory from the Star output

out.format

Format of output file. Can select "BAM SortedByCoordinate", "BAM Unsorted" or "BAM Unsorted SortedByCoordinate"

unmapped

Fastx will output unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads into separate file(s) Unmapped.out.mate1(2), formatted the same way as input read files.

sam.attributes

Alignment attributes for the SAM/BAM file, default set to "Standard"

quant.mode

Type of quantification required, recommend set to "GeneCounts"

compressed

Compression mode for input reads files, recommend set to "zcat" for gzipped files, can use "bzcat" for bz2 files

filter.type

Filtering to reduce the number of spurious junctions, default is Normal BySJout for filtering

filter.multi

Set maximum number of multiple alignments for a read, if exceeded read considered unmapped

filter.mismatch

Maximum number of mismatches per pair. Default 10, large number switches off this filter e.g. 999

filter.mismatch.pair

Max number of mismatches per pair relative to read length

intron.min

Minimum intron length, default 21

intron.max

Maximum intron lenght, default 0

mate.gap

Maximum gap between read pairs, default 0

min.overhang.annotated

minimum overhang for annotated junctions, default 3

min.overhang.unannotated

minimum overhang for unannotated junctions, default 5

solo.type

Type of single-cell RNASeq, for 10x Chromium or DropSeq use "CB_UMI_Simple"

solo.cell.filtering

Cell filtering type and parameters

white.list

Path to the file with the whitelist of cell barcodes

solo.cb.start

Cell barcode start base

solo.cb.len

Cell barcode length

solo.umi.start

UMI start base

solo.umi.len

UMI length

solo.barcode.read.length

Length of the barcode read. Set to 1 equal to sum of soloCBlen+soloUMIlen, set to 0 for do not check

solo.strand

Strandedness of the scRNA libraries

solo.features

Genomic features for which the UMI counts per Cell Barcode are collected

solo.multi.mappers

Counting method for reads mapping to multiple genes. Set to Unique, Uniform, Rescue, PorpUnique or EM

solo.umi.dedup

Type of UMI deduplication (collapsing) algorithm. 1MM_All - all UMIs with 1 mismatch distance to each other are collapsed. 1MM_Directional - follows the "directional" method from the UMI-tools by Smith, Heger and Sudbery (Genome Research 2017). 1MM_NotCollapsed - UMIs with 1 mismatch distance to others are not collapsed (i.e. all counted)

solo.umi.filter

Type of UMI filtering

solo.cb.wl.match

Matching the Cell Barcodes to the WhiteList

solo.out.filenames

File names for STARsolo output

threads

Number of threads

parallel

Run in parallel, default set to FALSE

cores

Number of cores/threads to use for parallel processing, default set to 4

execute

Whether to execute the commands or not, default set to TRUE

star

Path to the Star program

version

Returns the version number

Value

A list with the Star commands

Examples

## Not run: 
path <- "/full/path/to/program"
genome <- "/full/path/to/genome"

mate1.trim <- List of paths to trimmed forward reads for alignment
mate2.trim <- List of paths to trimmed reverse reads for alignment
sample.names <- List os sample names

cmds <- run_star(input1 = mate1.trim,
                 input2 = mate2.trim,
                 genome = genome,
                 sample.name = sample.names,
                 out.dir = results.dir,
                 unmapped = "Within",
                 sam.attributes = "Standard",
                 quant.mode = "GeneCounts",
                 parallel = TRUE,
                 cores = 4,
                 star = path)

# Version number
version <- run_star(star = path,
                    version = TRUE)

## End(Not run)


GrahamHamilton/pipelineTools documentation built on Aug. 4, 2024, 3:18 a.m.