ppRawData: RNA Sequencing raw data preprocessing
In harshsharma-cb/FASE: Analysis of RNA-Sequencing data using FASE (Finding Alternative Splicing Events).

ppRawData

R Documentation

RNA Sequencing raw data preprocessing

Description

Manual function to map reads with the reference genome, given SRA/fastq files. It also sorts and indexes the mapped reads for further processing. Reads produced by ppRawData can be summarized for genes, exons and introns using ppSumEIG. ppAuto is not required if ppRawData has been called.

System requirements for ppRawData include:

fastq-dump (if files='SRA')
tophat2
samtools

Usage

ppRawData(
  folderSRA = FALSE,
  srlist = NULL,
  pairedend = FALSE,
  genomeBI,
  files = "fastq",
  p = 1,
  N = 6,
  r = 44,
  mate_std_dev = 30,
  read_edit_dist = 6,
  max_intron_length = 10000,
  min_intron_length = 50,
  segment_length = NULL,
  ...
)

Arguments

`folderSRA`	path of directory containing fastq or SRA files. (default=current directory)
`srlist`	list of unique sample names of fastq/SRA files created by default in the function. Please follow naming convention for the sample files: For SRA files : "Sample-S1_1" "Sample-S1_2" (for paired-end reads) and "Sample-S1" (for single-end reads). For fastq files: "Sample-S1_1.fastq" "Sample-S1_2.fastq" (for paired-end reads) and "Sample-S1.fastq" (for single-end reads).
`pairedend`	boolean, TRUE if reads are paired-end and FALSE if reads are single-end. All files should be either single-end or paired-end. (default=FALSE)
`genomeBI`	path of genome build of the organism created using bowtie2-build command.
`files`	type of raw read file: fastq or sra (downloaded from NCBI). All files should be in same format and have same read length. (default=fastq)
`p`	number of threads to be utilized by samtools and Rsubread package. (default=1)
`N`	accepted read mismatches. Reads with more than N mismatches are discarded. (default=6) [tophat2 parameter]
`r`	expected inner distance between mate pair. (default=44) [tophat2 parameter]
`mate_std_dev`	the standard deviation for the distribution on inner distances between mate pairs. (default=30) [tophat2 parameter]
`read_edit_dist`	final read alignments having more than these many edit distance are discarded. (default=6) [tophat2 parameter]
`max_intron_length`	when searching for junctions ab initio, TopHat2 will ignore donor/acceptor pairs farther than this many bases apart, except when such a pair is supported by a split segment alignment of a long read. (default=10000) [tophat2 parameter]
`min_intron_length`	topHat2 will ignore donor/acceptor pairs closer than this many bases apart. (default=50) [tophat2 parameter]
`segment_length`	each read is divided into this length and mapped independently to find junctions. [tophat2 parameter]
`...`	other parameters to be passed to tophat2.
`gtf`	intron parsed gtf file of the organism. Please check `intronGTFparser` to generate intron parsed gtf file (to generate intron read counts).

Value

Mapped, sorted and indexed bam files. (Can be run separately using tophat2 and samtools or automatic wrapper function: ppAuto)
Junction Matrix: Matrix with junction count reads. (Can be run separately using getJunctionCountMatrix or wrapper function: ppAuto)

References

https://CRAN.R-project.org/view=HighPerformanceComputing
Sequence Read Archive Submissions Staff. Using the SRA Toolkit to convert .sra files into other formats. In: SRA Knowledge Base [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2011-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK158900/.
https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 25;14(4):R36 (2013 Apr). http://ccb.jhu.edu/software/tophat.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, and 1000 Genome Project Data Processing Subgroup, The Sequence alignment/map (SAM) format and SAMtools, Bioinformatics (2009) 25(16) 2078-9.

harshsharma-cb/FASE documentation built on Aug. 6, 2023, 1:37 a.m.

harshsharma-cb/FASE index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

harshsharma-cb/FASE
Analysis of RNA-Sequencing data using FASE (Finding Alternative Splicing Events).

ppRawData: RNA Sequencing raw data preprocessing
In harshsharma-cb/FASE: Analysis of RNA-Sequencing data using FASE (Finding Alternative Splicing Events).

RNA Sequencing raw data preprocessing

Description

Usage

Arguments

Value

References

Related to ppRawData in harshsharma-cb/FASE...

R Package Documentation

Browse R Packages

We want your feedback!

harshsharma-cb/FASE Analysis of RNA-Sequencing data using FASE (Finding Alternative Splicing Events).

ppRawData: RNA Sequencing raw data preprocessing In harshsharma-cb/FASE: Analysis of RNA-Sequencing data using FASE (Finding Alternative Splicing Events).

RNA Sequencing raw data preprocessing

Description

Usage

Arguments

Value

References

Related to ppRawData in harshsharma-cb/FASE...

R Package Documentation

Browse R Packages

We want your feedback!

harshsharma-cb/FASE
Analysis of RNA-Sequencing data using FASE (Finding Alternative Splicing Events).

ppRawData: RNA Sequencing raw data preprocessing
In harshsharma-cb/FASE: Analysis of RNA-Sequencing data using FASE (Finding Alternative Splicing Events).