Description Usage Arguments Value References Examples
View source: R/process_geo_rnaseq.R
process_geo_rnaseq
downloads and processes GEO RNA-seq data
for a given GEO series accession ID. It filters metadata for RNA-seq
samples only.
We use SRA toolkit for downloading SRA data, Trimmomatic for read
trimming (optional), and Salmon for read mapping.
1 2 3 4 5 | process_geo_rnaseq(geo_series_acc, destdir, download_method = "auto",
ascp = TRUE, prefetch_workspace, ascp_path, use_sra_file = FALSE,
trim_fastq = FALSE, index_dir, other_opts = NULL,
species = c("human", "mouse", "rat"), countsFromAbundance = c("no",
"scaledTPM", "lengthScaledTPM"), n_thread)
|
geo_series_acc |
GEO series accession ID. |
destdir |
directory where all the results will be saved. |
download_method |
download method for GEOquery. |
ascp |
logical, whether to use Aspera connect to download SRA
run files. If FALSE, then wget will be used to download files which
might be slower than |
prefetch_workspace |
directory where SRA run files will be
downloaded. This parameter is needed when |
ascp_path |
path to the Aspera software. |
use_sra_file |
logical, whether to download SRA file first and get fastq files afterwards. |
trim_fastq |
logical, whether to trim fastq file. |
index_dir |
directory of the indexing files needed for read
mapping using Salmon. See function |
other_opts |
options other than default to use for read mapping. See Salmon documentation for the available options. |
species |
name of the species. Only |
countsFromAbundance |
whether to generate counts based on
abundance. Available options are: |
n_thread |
number of cores to use. |
a list of metadata from GEO and SRA saved in the destdir
.
Another list of gene and transcript level estimated counts summarized
by Bioconductor package 'tximport'
is also saved in the
destdir
.
Rob Patro, Geet Duggal, Michael I. Love, Rafael A. Irizarry, and Carl Kingsford (2017): Salmon provides fast and bias-aware quantification of transcript expression. Nature methods, 14(4), 417. https://www.nature.com/articles/nmeth.4197
Charlotte Soneson, Michael I. Love, Mark D. Robinson (2015): Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research. http://dx.doi.org/10.12688/f1000research.7563.1
Philip Ewels, Mans Magnusson, Sverker Lundin, and Max Kaller (2016): MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047-3048. https://doi.org/10.1093/bioinformatics/btw354
1 2 3 4 5 6 7 8 9 10 11 | geo_series_acc="GSE102170"
#You will have to build index first before running this function.
build_index(species="human",kmer=31,ens_release=92,
destdir=tempdir())
process_geo_rnaseq (geo_series_acc=geo_series_acc,destdir=tempdir(),
download_method="auto",
ascp=FALSE,prefetch_workspace=NULL,
ascp_path=NULL,use_sra_file=FALSE,trim_fastq=FALSE,
index_dir=tempdir(),species="human",
countsFromAbundance="lengthScaledTPM",n_thread=1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.