Description Usage Arguments Value References Examples
View source: R/process_geo_rnaseq.R
process_geo_rnaseq downloads and processes GEO RNA-seq data
for a given GEO series accession ID. It filters metadata for RNA-seq
samples only.
We use SRA toolkit for downloading SRA data, Trimmomatic for read
trimming (optional), and Salmon for read mapping.
1 2 3 4 5 | process_geo_rnaseq(geo_series_acc, destdir, download_method = "auto",
ascp = TRUE, prefetch_workspace, ascp_path, use_sra_file = FALSE,
trim_fastq = FALSE, index_dir, other_opts = NULL,
species = c("human", "mouse", "rat"), countsFromAbundance = c("no",
"scaledTPM", "lengthScaledTPM"), n_thread)
|
geo_series_acc |
GEO series accession ID. |
destdir |
directory where all the results will be saved. |
download_method |
download method for GEOquery. |
ascp |
logical, whether to use Aspera connect to download SRA
run files. If FALSE, then wget will be used to download files which
might be slower than |
prefetch_workspace |
directory where SRA run files will be
downloaded. This parameter is needed when |
ascp_path |
path to the Aspera software. |
use_sra_file |
logical, whether to download SRA file first and get fastq files afterwards. |
trim_fastq |
logical, whether to trim fastq file. |
index_dir |
directory of the indexing files needed for read
mapping using Salmon. See function |
other_opts |
options other than default to use for read mapping. See Salmon documentation for the available options. |
species |
name of the species. Only |
countsFromAbundance |
whether to generate counts based on
abundance. Available options are: |
n_thread |
number of cores to use. |
a list of metadata from GEO and SRA saved in the destdir.
Another list of gene and transcript level estimated counts summarized
by Bioconductor package 'tximport' is also saved in the
destdir.
Rob Patro, Geet Duggal, Michael I. Love, Rafael A. Irizarry, and Carl Kingsford (2017): Salmon provides fast and bias-aware quantification of transcript expression. Nature methods, 14(4), 417. https://www.nature.com/articles/nmeth.4197
Charlotte Soneson, Michael I. Love, Mark D. Robinson (2015): Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research. http://dx.doi.org/10.12688/f1000research.7563.1
Philip Ewels, Mans Magnusson, Sverker Lundin, and Max Kaller (2016): MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047-3048. https://doi.org/10.1093/bioinformatics/btw354
1 2 3 4 5 6 7 8 9 10 11 | geo_series_acc="GSE102170"
#You will have to build index first before running this function.
build_index(species="human",kmer=31,ens_release=92,
destdir=tempdir())
process_geo_rnaseq (geo_series_acc=geo_series_acc,destdir=tempdir(),
download_method="auto",
ascp=FALSE,prefetch_workspace=NULL,
ascp_path=NULL,use_sra_file=FALSE,trim_fastq=FALSE,
index_dir=tempdir(),species="human",
countsFromAbundance="lengthScaledTPM",n_thread=1)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.