ppRawData | R Documentation |
Manual function to map reads with the reference genome, given SRA/fastq files. It also sorts and indexes the mapped reads for further processing. Reads produced by ppRawData can be summarized for genes, exons and introns using ppSumEIG
. ppAuto
is not required if ppRawData has been called.
System requirements for ppRawData include:
fastq-dump (if files='SRA')
tophat2
samtools
ppRawData(
folderSRA = FALSE,
srlist = NULL,
pairedend = FALSE,
genomeBI,
files = "fastq",
p = 1,
N = 6,
r = 44,
mate_std_dev = 30,
read_edit_dist = 6,
max_intron_length = 10000,
min_intron_length = 50,
segment_length = NULL,
...
)
folderSRA |
path of directory containing fastq or SRA files. (default=current directory) |
srlist |
list of unique sample names of fastq/SRA files created by default in the function. Please follow naming convention for the sample files: |
pairedend |
boolean, TRUE if reads are paired-end and FALSE if reads are single-end. All files should be either single-end or paired-end. (default=FALSE) |
genomeBI |
path of genome build of the organism created using bowtie2-build command. |
files |
type of raw read file: fastq or sra (downloaded from NCBI). All files should be in same format and have same read length. (default=fastq) |
p |
number of threads to be utilized by samtools and Rsubread package. (default=1) |
N |
accepted read mismatches. Reads with more than N mismatches are discarded. (default=6) [tophat2 parameter] |
r |
expected inner distance between mate pair. (default=44) [tophat2 parameter] |
mate_std_dev |
the standard deviation for the distribution on inner distances between mate pairs. (default=30) [tophat2 parameter] |
read_edit_dist |
final read alignments having more than these many edit distance are discarded. (default=6) [tophat2 parameter] |
max_intron_length |
when searching for junctions ab initio, TopHat2 will ignore donor/acceptor pairs farther than this many bases apart, except when such a pair is supported by a split segment alignment of a long read. (default=10000) [tophat2 parameter] |
min_intron_length |
topHat2 will ignore donor/acceptor pairs closer than this many bases apart. (default=50) [tophat2 parameter] |
segment_length |
each read is divided into this length and mapped independently to find junctions. [tophat2 parameter] |
... |
other parameters to be passed to tophat2. |
gtf |
intron parsed gtf file of the organism. Please check |
Mapped, sorted and indexed bam files. (Can be run separately using tophat2 and samtools or automatic wrapper function: ppAuto
)
Junction Matrix: Matrix with junction count reads. (Can be run separately using getJunctionCountMatrix
or wrapper function: ppAuto
)
https://CRAN.R-project.org/view=HighPerformanceComputing
Sequence Read Archive Submissions Staff. Using the SRA Toolkit to convert .sra files into other formats. In: SRA Knowledge Base [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2011-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK158900/.
https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 25;14(4):R36 (2013 Apr). http://ccb.jhu.edu/software/tophat.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.