Description Usage Arguments Details Value See Also Examples
Given a single NGS fastq/fasta library, or a paired setup of 2 mated libraries. Run alignment and optionally remove contaminants.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | STAR.align.single(
file1,
file2 = NULL,
output.dir,
index.dir,
star.path = STAR.install(),
fastp = install.fastp(),
steps = "tr-ge",
adapter.sequence = "auto",
min.length = 20,
mismatches = 3,
trim.front = 0,
max.multimap = 10,
alignment.type = "Local",
max.cpus = min(90, detectCores() - 1),
wait = TRUE,
resume = NULL,
script.single = system.file("STAR_Aligner", "RNA_Align_pipeline.sh", package =
"ORFik")
)
|
file1 |
library file, if paired must be R1 file. Allowed formats are: (.fasta, .fastq, .fq, or.fa) with or without compression of .gz. This filename usually contains a suffix of .1 |
file2 |
default NULL, set if paired end to R2 file. Allowed formats are: (.fasta, .fastq, .fq, or.fa) with or without compression of .gz. This filename usually contains a suffix of .2 |
output.dir |
directory to save indices, default: paste0(dirname(arguments[1]), "/STAR_index/"), where arguments is the arguments input for this function. |
index.dir |
path to STAR index folder. Path returned from ORFik function STAR.index, when you created the index folders. |
star.path |
path to STAR, default: STAR.install(), if you don't have STAR installed at default location, it will install it there, set path to a runnable star if you already have it. |
fastp |
path to fastp trimmer, default: install.fastp(), if you have it somewhere else already installed, give the path. Only works for unix (linux or Mac OS), if not on unix, use your favorite trimmer and give the output files from that trimmer as input.dir here. |
steps |
a character, default: "tr-ge", trimming then genome alignment
If not "all", a subset of these ("tr-co-ph-rR-nc-tR-ge") |
adapter.sequence |
character, default: "auto". Auto detect adapter using fastp
adapter auto detection, checking first 1.5M reads. (auto detect adapter, is not
very reliable for Ribo-seq, so then you must include a manually specified,
else alignment will most likely fail!). If already trimmed or trimming not wanted:
adapter.sequence = "disable" .You can manually assign adapter like:
"ATCTCGTATGCCGTCTTCTGCTTG" or "AAAAAAAAAAAAA". You can also specify one of the three
presets:
|
min.length |
20, minimum length of aligned read without mismatches to pass filter. |
mismatches |
3, max non matched bases. Excludes soft-clipping, this only filters reads that have defined mismatches in STAR. Only applies for genome alignment step. |
trim.front |
0, default trim 0 bases 5'. For Ribo-seq set use 0. Ignored if tr (trim) is not one of the arguments in "steps" |
max.multimap |
numeric, default 10. If a read maps to more locations than specified, will skip the read. Set to 1 to only get unique mapping reads. Only applies for genome alignment step. The depletions are allowing for multimapping. |
alignment.type |
default: "Local": standard local alignment with soft-clipping allowed, "EndToEnd" (global): force end-to-end read alignment, does not soft-clip. |
max.cpus |
integer, default: min(90, detectCores() - 1), number of threads to use. Default is minimum of 90 and maximum cores - 1. So if you have 8 cores it will use 7. |
wait |
a logical (not |
resume |
default: NULL, continue from step, lets say steps are "tr-ph-ge": (trim, phix depletion, genome alignment) and resume is "ge", you will then use the assumed already trimmed and phix depleted data and start at genome alignment, useful if something crashed. Like if you specified wrong STAR version, but the trimming step was completed. Resume mode can only run 1 step at the time. |
script.single |
location of STAR single file alignment script, default internal ORFik file. You can change it and give your own if you need special alignments. |
Can only run on unix systems (Linux and Mac), and requires
minimum 30GB memory on genomes like human, rat, zebrafish etc.
If for some reason the internal STAR alignment bash script will not work for you,
like if you have a very small genome. You can copy the internal alignment script,
edit it and give that as the Index script used for this function.
The trimmer used is fastp (the fastest I could find), works on mac and linux.
If you want to use your own trimmer set file1/file2 to the location of
the trimmed files from your program.
A note on trimming from creator of STAR about trimming:
"adapter trimming it definitely needed for short RNA sequencing.
For long RNA-seq, I would agree with Devon that in most cases adapter trimming
is not advantageous, since, by default, STAR performs local (not end-to-end) alignment,
i.e. it auto-trims." So trimming can be skipped for longer reads.
output.dir, can be used as as input in ORFik::create.experiment
Other STAR:
STAR.align.folder()
,
STAR.allsteps.multiQC()
,
STAR.index()
,
STAR.install()
,
STAR.multiQC()
,
STAR.remove.crashed.genome()
,
getGenomeAndAnnotation()
,
install.fastp()
1 2 3 4 5 6 7 | ## Specify output libraries:
output.dir <- "/Bio_data/references/Human"
bam.dir <- "data/processed/human_rna_seq"
# arguments <- getGenomeAndAnnotation("Homo sapiens", output.dir)
# index <- STAR.index(arguments, output.dir)
# STAR.align.single("data/raw_data/human_rna_seq/file1.bam", bam.dir,
# index)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.