pipelineTools: Pipeline Tools for NGS Sequence Analysis Pipelines

run_star

R Documentation

Run the STAR/STARsolo program

Description

Runs the STAR alignment program, can be run in parallel on multiple cores. STARsolo also implmented.

Usage

run_star(
  input1 = NULL,
  input2 = NULL,
  genome.dir = NULL,
  sample.name = NULL,
  out.dir = NULL,
  out.format = NULL,
  unmapped = NULL,
  sam.attributes = NULL,
  quant.mode = NULL,
  compressed = NULL,
  filter.type = NULL,
  filter.multi = NULL,
  filter.mismatch = NULL,
  filter.mismatch.pair = NULL,
  intron.min = NULL,
  intron.max = NULL,
  mate.gap = NULL,
  min.overhang.annotated = NULL,
  min.overhang.unannotated = NULL,
  solo.type = NULL,
  solo.cell.filtering = NULL,
  white.list = NULL,
  solo.cb.start = NULL,
  solo.cb.len = NULL,
  solo.umi.start = NULL,
  solo.umi.len = NULL,
  solo.barcode.read.length = NULL,
  solo.strand = NULL,
  solo.features = NULL,
  solo.multi.mappers = NULL,
  solo.umi.dedup = NULL,
  solo.umi.filter = NULL,
  solo.cb.wl.match = NULL,
  solo.out.filenames = NULL,
  threads = 10,
  parallel = FALSE,
  cores = 4,
  execute = TRUE,
  star = NULL,
  version = FALSE
)

Arguments

`input1`	List of the paths to files containing to the forward reads
`input2`	List of the paths to files containing to the reverse reads
`genome.dir`	Path to the directory where genome files are stored
`sample.name`	List of the sample names
`out.dir`	Name of the directory from the Star output
`out.format`	Format of output file. Can select "BAM SortedByCoordinate", "BAM Unsorted" or "BAM Unsorted SortedByCoordinate"
`unmapped`	Fastx will output unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads into separate file(s) Unmapped.out.mate1(2), formatted the same way as input read files.
`sam.attributes`	Alignment attributes for the SAM/BAM file, default set to "Standard"
`quant.mode`	Type of quantification required, recommend set to "GeneCounts"
`compressed`	Compression mode for input reads files, recommend set to "zcat" for gzipped files, can use "bzcat" for bz2 files
`filter.type`	Filtering to reduce the number of spurious junctions, default is Normal BySJout for filtering
`filter.multi`	Set maximum number of multiple alignments for a read, if exceeded read considered unmapped
`filter.mismatch`	Maximum number of mismatches per pair. Default 10, large number switches off this filter e.g. 999
`filter.mismatch.pair`	Max number of mismatches per pair relative to read length
`intron.min`	Minimum intron length, default 21
`intron.max`	Maximum intron lenght, default 0
`mate.gap`	Maximum gap between read pairs, default 0
`min.overhang.annotated`	minimum overhang for annotated junctions, default 3
`min.overhang.unannotated`	minimum overhang for unannotated junctions, default 5
`solo.type`	Type of single-cell RNASeq, for 10x Chromium or DropSeq use "CB_UMI_Simple"
`solo.cell.filtering`	Cell filtering type and parameters
`white.list`	Path to the file with the whitelist of cell barcodes
`solo.cb.start`	Cell barcode start base
`solo.cb.len`	Cell barcode length
`solo.umi.start`	UMI start base
`solo.umi.len`	UMI length
`solo.barcode.read.length`	Length of the barcode read. Set to 1 equal to sum of soloCBlen+soloUMIlen, set to 0 for do not check
`solo.strand`	Strandedness of the scRNA libraries
`solo.features`	Genomic features for which the UMI counts per Cell Barcode are collected
`solo.multi.mappers`	Counting method for reads mapping to multiple genes. Set to Unique, Uniform, Rescue, PorpUnique or EM
`solo.umi.dedup`	Type of UMI deduplication (collapsing) algorithm. 1MM_All - all UMIs with 1 mismatch distance to each other are collapsed. 1MM_Directional - follows the "directional" method from the UMI-tools by Smith, Heger and Sudbery (Genome Research 2017). 1MM_NotCollapsed - UMIs with 1 mismatch distance to others are not collapsed (i.e. all counted)
`solo.umi.filter`	Type of UMI filtering
`solo.cb.wl.match`	Matching the Cell Barcodes to the WhiteList
`solo.out.filenames`	File names for STARsolo output
`threads`	Number of threads
`parallel`	Run in parallel, default set to FALSE
`cores`	Number of cores/threads to use for parallel processing, default set to 4
`execute`	Whether to execute the commands or not, default set to TRUE
`star`	Path to the Star program
`version`	Returns the version number

Value

A list with the Star commands

Examples

## Not run: 
path <- "/full/path/to/program"
genome <- "/full/path/to/genome"

mate1.trim <- List of paths to trimmed forward reads for alignment
mate2.trim <- List of paths to trimmed reverse reads for alignment
sample.names <- List os sample names

cmds <- run_star(input1 = mate1.trim,
                 input2 = mate2.trim,
                 genome = genome,
                 sample.name = sample.names,
                 out.dir = results.dir,
                 unmapped = "Within",
                 sam.attributes = "Standard",
                 quant.mode = "GeneCounts",
                 parallel = TRUE,
                 cores = 4,
                 star = path)

# Version number
version <- run_star(star = path,
                    version = TRUE)

## End(Not run)

GrahamHamilton/pipelineTools documentation built on Jan. 14, 2025, 10:13 p.m.