Description Usage Arguments Details Author(s) Examples
Map reads to genome using STAR. Most parameter descriptions are from STAR manual.
1 2 3 4 |
readFilesIn |
Character - files with sequences to map (if paired-end data, the R1 files) |
genomeDir |
String - Path to directory where genome indices were generated by STAR |
starPath |
String - Path to directory with STAR executable |
outDest |
String - Directory where output files should be saved |
outSuffix |
String - will be appended to original filename |
outTrimString |
String - will be trimmed from output filenames - don't include input file extension, as this is trimmed from basename(readFilesIn) |
runThreadN |
Numeric - How many cores to use |
readFilesInR2 |
Character - If data are paired, the R2 files |
settings |
String - Use ENCODE settings for long or short RNA ("ENCODE_long" or "ENCODE_short") |
settingsOverride |
String - Additional inputs to STAR; anything here that is also in ENCODE settings will override the ENCODE value |
Take a list of fastq files containing reads, and get alignments with STAR. STAR's defaults are the defaults here. Override by defining "settings" as "ENCODE_long" or "ENCODE_short", to use ENCODE settings for long or short RNA. Override specific parameter values in the ENCODE settings, and set any other parameters you want, with settingsOverride. This argument is a string that will be tacked on to the command issued to the command line, as-is. If an input file doesn't exist, you'll get an error. If an output file (at least a Log.out file) does exist, it'll just skip the corresponding input file. For paired-end data, readFilesIn are the R1 files and readFilesIn2 are the R2 files. They need to have R1 / R2 or r1 / r2 in the filenames, which based on looking online is the norm. TIME: ~10m per 5G fastq. 3-6 hours for an entire total-RNA dataset (~25-30 samples). Nuc-seq took only ~30m. COMMAND LINE EXAMPLE, using ENCODE settings for long RNA (if you want to play with the parameters while looking at just one file, this might be easiest): /opt/STAR/bin/MacOSX_x86_64/STAR –genomeDir $genomeDir –readFilesIn $pathTrimmed$sn$trimmedSuffix –outFilterType BySJout –outFilterMultimapNmax 20 –alignSJoverhangMin 8 –alignSJDBoverhangMin 1 –outFilterMismatchNmax 999 –outFilterMismatchNoverLmax 0.04 –alignIntronMin 20 –alignIntronMax 1000000 –alignMatesGapMax 1000000 –outSAMtype 'BAM SortedByCoordinate' –outFileNamePrefix $pathMapped$sn$mappedSuffix#'
Emma Myers
1 2 3 4 5 6 7 | Example using paired-end data.
gdir = '/Volumes/CodingCLub1/STAR_stuff/indexes/refGene_gtf_maxLen75/'
fastqs=dir(paste(dataPath,'trimmed',sep=''), pattern='fastq', full.names=TRUE)
r1=fastqs[which(regexpr("R1",fastqs)>0)]
r2=fastqs[which(regexpr("R2",fastqs)>0)]
Use ENCODE settings for short RNA, except for outFilterMatchNmin
STAR_run(r1,genomeDir=gdir,outDest='sandbox/packtest/', outTrimString='_R1_001_trimmed', readFilesInR2=r2, runThreadN=8, settings="ENCODE_short", settingsOverride=c("--outFilterMatchNmin", "8") )
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.