Description Usage Arguments Value Exon filters Gene filters Normalization parameters Statistics parameters Presets Note Author(s) Examples
View source: R/metaseqr.main.R
This function is the main metaseqr workhorse and implements the main metaseqr workflow which performs data read, filtering, normalization and statistical selection, creates diagnostic plots and exports the results and a report if requested. The metaseqr function is responsible for assembling all the steps of the metaseqr pipeline which i) reads the input gene or exon read count table ii) performs prelimininary filtering of data by removing chrM and other non-essential information for a typical differential gene expression analysis as well as a preliminary expression filtering based on the exon counts, if an exon read count file is provided. iii) performs data normalization with one of currently widely used algorithms, including EDASeq (Risso et al., 2011), DESeq (Anders and Huber, 2010), edgeR (Robinson et al., 2010), NOISeq (Tarazona et al., 2012) or no normalization iv) performs a second stage of filtering based on the normalized gene expression according to several gene filters v) performs statistical testing with one or more of currently widely used algorithms, including DESeq (Anders and Huber, 2010), edgeR (Robinson et al., 2010), NOISeq (Tarazona et al., 2012), limma (Smyth et al., 2005) for RNA-Seq data, baySeq (Hardcastle et al., 2012) vi) in the case of multiple statistical testing algorithms, performs meta-analysis using one of five available methods (see the meta.p argument) vii) exports the resulting differentially expressed gene list in text tab-delimited format viii) creates a set of diagnostic plots either available in the aforementioned packages or metaseqr specific ones and ix) creates a comprehensive HTML report which summarizes the run information, the results and the diagnostic plots. Certain diagnostic plots (e.g. the volcano plot) can be interactive with the use of the external Highcharts (http://www.highcharts.com) JavaScript library for interactive graphs. Although the inputs to the metaseqr workflow are many, in practice, setting only very few of them and accepting the defaults as the rest can result in quite comprehensible results for mainstream organisms like mouse, human, fly and rat.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | metaseqr(counts, sample.list, exclude.list = NULL,
file.type = c("auto", "sam", "bam", "bed"),
path = NULL, contrast = NULL, libsize.list = NULL,
id.col = 4, gc.col = NA, name.col = NA, bt.col = NA,
annotation = c("download", "embedded"), gene.file = NULL,
org = c("hg18", "hg19", "hg38", "mm9", "mm10", "rn5", "rn6",
"dm3", "dm6", "danrer7", "pantro4", "susscr3", "tair10",
"equcab2","custom"),
refdb = c("ensembl", "ucsc", "refseq"),
trans.level = c("gene", "transcript", "exon"),
count.type = c("gene", "exon","utr"),
utr.flank = 500,
exon.filters = list(min.active.exons = list(exons.per.gene = 5,
min.exons = 2, frac = 1/5)),
gene.filters = list(length = list(length = 500),
avg.reads = list(average.per.bp = 100, quantile = 0.25),
expression = list(median = TRUE, mean = FALSE, quantile = NA,
known = NA, custom = NA),
biotype = get.defaults("biotype.filter", org[1]),
presence = list(frac = 0.25, min.count = 10,
per.condition = FALSE)),
when.apply.filter = c("postnorm", "prenorm"),
normalization = c("deseq", "edaseq", "edger", "noiseq", "nbpseq",
"each", "none"),
norm.args = NULL,
statistics = c("deseq", "edger", "noiseq", "bayseq", "limma",
"nbpseq"),
stat.args = NULL,
adjust.method = sort(c(p.adjust.methods, "qvalue")),
meta.p = if (length(statistics) > 1) c("simes", "bonferroni", "fisher",
"dperm.min", "dperm.max", "dperm.weight", "fperm", "whitlock",
"minp", "maxp", "weight", "pandora", "none") else "none",
weight = rep(1/length(statistics), length(statistics)),
nperm = 10000, reprod=TRUE, pcut = NA, log.offset = 1,
preset = NULL,
qc.plots = c("mds", "biodetection", "countsbio", "saturation",
"readnoise", "filtered", "correl", "pairwise", "boxplot",
"gcbias", "lengthbias", "meandiff", "meanvar", "rnacomp",
"deheatmap", "volcano", "biodist"),
fig.format = c("png", "jpg", "tiff", "bmp", "pdf", "ps"),
out.list = FALSE, export.where = NA,
export.what = c("annotation", "p.value", "adj.p.value",
"meta.p.value", "adj.meta.p.value", "fold.change",
"stats", "counts","flags"),
export.scale = c("natural", "log2", "log10", "vst", "rpgm"),
export.values = c("raw", "normalized"),
export.stats = c("mean", "median", "sd", "mad", "cv",
"rcv"),
export.counts.table = FALSE,
restrict.cores = 0.6, report = TRUE, report.top = 0.1,
report.template = "default", save.gene.model = TRUE,
verbose = TRUE, run.log = TRUE, progress.fun = NULL, ...)
|
counts |
a text tab-delimited file containing gene
or exon counts in one of the following formats: i) the
first column contains unique gene or exon identifiers and
the rest of the columns contain the read counts for each
sample. Thus the first cell of each row is a gene or exon
accession and the rest are integers representing the
counts for that accession. In that case, the
|
sample.list |
a list containing condition names and
the samples under each condition. It should have the
format |
exclude.list |
a list of samples to exclude, in the
same (list) format as |
path |
an optional path where all the BED/BAM files are placed, to be prepended to the BAM/BED file names in the targets file. If not given and if the files in the second column of the targets file do not contain a path to a directory, the current directory is assumed to be the BAM/BED file container. |
file.type |
the type of raw input files. It can be
|
contrast |
a character vector of contrasts to be
tested in the statistical testing step(s) of the metaseqr
pipeline. Each element of contrast should STRICTLY have the
format "ConditionA_vs_ConditionB_vs_...". A valid example
based on the |
libsize.list |
an optional named list where names
represent samples (MUST be the same as the samples in
|
id.col |
an integer denoting the column number in
the file (or data frame) provided with the counts
argument, where the unique gene or exon accessions are.
Default to |
gc.col |
an integer denoting the column number in
the file (or data frame) provided with the |
name.col |
an integer denoting the column number in
the file (or data frame) provided with the counts
argument, where the HUGO gene symbols are given. If not
provided, it will not be available when reporting
results. In addition, the |
bt.col |
an integer denoting the column number in
the file (or data frame) provided with the counts
argument, where the gene biotypes are given. If not
provided, the |
annotation |
instructs metaseqr where to find the
annotation for the given counts file. It can be one of i)
|
gene.file |
an external gene annotation file
required when |
org |
the supported organisms by metaseqr. These can
be, for human genomes |
refdb |
the reference annotation repository from
which to retrieve annotation elements to use with
metaseqr. It can be one of |
trans.level |
perform differential expression
analysis at which transcriptional unit, can be one of
|
count.type |
the type of reads inside the counts
file. It can be one of |
utr.flank |
the number of base pairs to flank the 3' UTR of transcripts when analyzing Quant-Seq data. |
exon.filters |
a named list whose names are the names of the supported exon filters and its members the filter parameters. See section "Exon filters" below for details. |
gene.filters |
a named list whose names are the names of the supported gene filters and its members the filter parameters. See section "Gene filters" below for details. |
when.apply.filter |
a character string determining
when to apply the exon and/or gene filters, relative to
normalization. It can be |
normalization |
the normalization algorithm to be
applied on the count data. It can be one of
|
norm.args |
a named list whose names are the names
of the normalization algorithm parameters and its members
parameter values. See section "Normalization parameters"
below for details. Leave |
statistics |
one or more statistical analyses to be
performed by the metaseqr pipeline.It can be one or more
of |
stat.args |
a named list whose names are the names
of the statistical algorithms used in the pipeline. Each
member is another named list whose names are the
algorithm parameters and its members are the parameter
values. See section "Statistics parameters" below for
details. Leave |
adjust.method |
the multiple testing p-value
adjustment method. It can be one of
|
meta.p |
the meta-analysis method to combine
p-values from multiple statistical tests . It can be
one of |
weight |
a vector of weights with the same length as
the |
nperm |
the number of permutations performed to
derive the meta p-value when |
reprod |
create reproducible permutations when
|
pcut |
a p-value cutoff for exporting differentially genes, default is to export all the non-filtered genes. |
log.offset |
an offset to be added to values during
logarithmic transformations in order to avoid Infinity
(default is |
preset |
an analysis strictness preset.
|
qc.plots |
a set of diagnostic plots to show/create.
It can be one or more of |
fig.format |
the format of the output diagnostic
plots. It can be one or more of |
out.list |
a logical controlling whether to export a list with the results in the running environment. |
export.where |
an output directory for the project results (report, lists, diagnostic plots etc.) |
export.what |
the content of the final lists. It can
be one or more of |
export.scale |
export values from one or more
transformations applied to the data. It can be one or
more of |
export.values |
It can be one or more of
|
export.stats |
calculate and export several
statistics on raw and normalized counts, condition-wise.
It can be one or more of |
export.counts.table |
exports also the calculated
read counts table when input is read from bam files
and exports also the normalized count table in all
cases. Defaults to |
restrict.cores |
in case of parallel execution of
several subfunctions, the fraction of the available cores
to use. In some cases if all available cores are used
( |
report |
a logical value controlling whether to
produce a summary report or not. Defaults to
|
report.top |
a fraction of top statistically
significant genes to append to the HTML report. This
helps in keeping the size of the report as small as
possible, as appending the total gene list might
create a huge HTML file. Users can always retrieve
the whole gene lists from the report links. Defaults
to |
report.template |
an HTML template to use for the report. Do not change this unless you know what you are doing. |
save.gene.model |
in case of exon analysis, a list
with exon counts for each gene will be saved to the file
|
verbose |
print informative messages during
execution? Defaults to |
run.log |
write a log file of the |
progress.fun |
a function which updates a
|
... |
further arguments that may be passed to
plotting functions, related to |
If out.list
is TRUE
, a named list whose
length is the same as the number of requested contrasts.
Each list member is named according to the corresponding
contrast and contains a data frame of differentially
expressed genes for that contrast. The contents of the
data frame are defined by the export.what,
export.scale, export.stats, export.values
parameters. If
report
is TRUE
, the output list contains
two main elements. The first is described above (the
analysis results) and the second contains the same
results but in HTML formatted tables.
The exon filters are a set of filters which are applied
after the gene models are assembled from the read counts
of individual exons and before the gene expression is
summarized from the exons belonging to each gene. These
filters can be applied when the input read counts file
contains exon reads. It is not applicable when the input
file already contains gene counts. Such filters can be
for example "accept genes where all the exons contain
more than x reads" or "accept genes where there is read
presence in at least m/n exons, n being the total exons
of the gene". Such filters are NOT meant for detecting
differential splicing as also the whole metaseqr
pipeline, thus they should not be used in that context.
The exon.filters
argument is a named list of
filters, where the names are the filter names and the
members are the filter parameters (named lists with
parameter name, parameter value). See the usage of the
metaseqr
function for an example of how these
lists are structured. The supported exon filters in the
current version are: i) min.active.exons
which
implements a filter for demanding m out of n exons of a
gene to have a certain read presence with parameters
exons.per.gene
, min.exons
and frac
.
The filter is described as follows: if a gene has up to
exons.per.gene
exons, then read presence is
required in at least min.exons
of them, else read
presence is required in a frac
fraction of the
total exons. With the default values, the filter
instructs that if a gene has up to 5 exons, read presence
is required in at least 2, else in at least 20
exons, in order to be accepted. More filters will be
implemented in future versions and users are encouraged
to propose exon filter ideas to the author by mail. See
metaseqr
usage for the defaults. Set
exon.filters=NULL
to not apply any exon filtering.
The gene filters are a set of filters applied to gene
expression as this is manifested through the read
presence on each gene and are preferably applied after
normalization. These filters can be applied both when the
input file or data frame contains exon read counts and
gene read counts. Such filter can be for example "accept
all genes above a certain count threshold" or "accept all
genes with expression above the median of the normalized
counts distribution" or "accept all with length above a
certain threshold in kb" or "exclude the 'pseudogene'
biotype from further analysis". The supported gene
filters in the current version, which have the same
structure as the exon filters (named list of lists with
filter names, parameter names and parameter arguments)
are: i) length
which implements a length filter
where genes are accepted for further analysis if they are
above length
(its parameter) kb. ii)
avg.reads
which implements a filter where a gene
is accepted for further analysis if it has more average
reads than the quantile
of the average count
distribution per average.per.bp
base pairs. In
summary, the reads of each gene are averaged per
average.per.bp
based on each gene's length (in
case of exons, input the "gene's length" is the sum of
the lengths of exons) and the quantile
quantile of
the average counts distribution is calculated for each
sample. Genes passing the filter should have an average
read count larger than the maximum of the vector of the
quantiles calculated above. iii) expression
which
implements a filter based on the overall expression of a
gene. The parameters of this filter are: median
,
where genes below the median of the overall count
distribution are not accepted for further analysis (this
filter has been used to distinguish between "expressed"
and "not expressed" genes in several cases, e.g. (Mokry
et al., NAR, 2011) with a logical as value, mean
which is the same as median
but using the mean,
quantile
which is the same as the previous two but
using a specific quantile of the total counts
distribution, known
, where in this case, a set of
known not-expressed genes in the system under
investigation are used to estimate an expression cutoff.
This can be quite useful, as the genes are filtered based
on a "true biological" cutoff instead of a statistical
cutoff. The value of this filter is a character vector of
HUGO gene symbols (MUST be contained in the annotation,
thus it's better to use annotation="download"
)
whose counts are used to build a "null" expression
distribution. The 90th quantile of this distribution is
then the expression cutoff. This filter can be combined
with any other filter. Be careful with gene names as they
are case sensitive and must match exactly ("Pten" is
different from "PTEN"!). iv) biotype
where in this
case, genes with a certain biotype (MUST be contained in
the annotation, thus it's better to use
annotation="download"
) are excluded from the
analysis. This filter is a named list of logical, where
names are the biotypes in each genome and values are
TRUE
or FALSE
. If the biotype should be
excluded, the value should be TRUE
else
FALSE
. See the result of
get.defaults("biotype.filter","hg19")
for an
example. Finally, in future versions there will be
support for user-defined filters in the form of a
function. v) presence
where in this case, a gene
is further considered for statistical testing if
frac
(x100 for a percentage value) have more
than min.count
reads across all samples
(per.condition=FALSE
) or across the samples
of each condition (per.condition=TRUE
).
The normalization parameters are passed again as a named
list where the names of the members are the normalization
parameter names and the values are the normalization
parameter values. You should check the documentation of
the packages EDASeq, DESeq, edgeR, NOISeq and NBPSeq for
the parameter names and parameter values. There are a few
exceptions in parameter names: in case of
normalization="edaseq"
the only parameter names
are within.which
and between.which
,
controlling the withing lane/sample and between
lanes/samples normalization algorithm. In the case
of normalization="nbpseq"
, there is one
additional parameter called main.method
which can
take the calues "nbpseq"
or "nbsmyth"
.
These values correspond to the two different workflows
available in the NBPSeq package. Please, consult the
NBPSeq package documentation for further details. For the
rest of the algorithms, the parameter names are the same
as the names used in the respective packages. For
examples, please use the get.defaults
function.
The statistics parameters as passed to statistical
algorithms in metaseqr, exactly with the same way as the
normalization parametes above. In this case, there is one
more layer in list nesting. Thus, stat.args
is a
named list whose names are the names the algorithms used
(see the statistics
parameter). Each member is
another named list,with parameters to be used for each
statistical algorithm. Again, the names of the member
lists are parameter names and the values of the member
lists are parameter values. You should check the
documentations of DESeq, edgeR, NOISeq, baySeq, limma and
NBPSeq for these parameters. There are a few exceptions
in parameter names: In case of statistics="edger"
,
apart from the rest of the edgeR statistical testing
arguments, there is the argument main.method
which
can be either "classic"
or "glm"
, again
defining whether the binomial test or GLMs will be used
for statistical testing. For examples, please use the
get.defaults
function. When
statistics="nbpseq"
, apart from the rest arguments
of the NBPSeq functions estimate.disp
and
estimate.dispersion
, there is the argument
main.method
which can be "nbpseq"
or
"nbsmyth"
. This argument determines the parameters
to be used by the estimate.dispersion
function or
by the estimate.disp
function to estimate RNA-Seq
count dispersions. The difference between the two is that
they constitute different starting points for the two
workflows in the package NBPSeq. The first worklfow (with
main.method="nbpseq"
and the
estimate.dispersion
function is NBPSeq package
specific, while the second (with
main.method="nbsmyth"
and the estimate.disp
function is similar to the workflow of the edgeR package.
For additional information regarding the statistical
testing in NBPSeq, please consult the documentation of
the NBPSeq package. Additinally, please note that
there is currently a problem with the NBPSeq package and
the workflow that is specific to the NBPSeq package. The
problem has to do with function exporting as there are
certain functions which are not recognized from the
package internally. For this reason and until it is
fixed, only the Smyth workflow will be available with the
NBPSeq package (thus
stat.args$main.method="nbpseq"
will not
be available)!
The analysis presets are a set of keywords (only one can
be used) that predefine some of the parameters of the
metaseqr pipeline. For the time being they are quite
simple and they control i) the strictness of filtering
and statistical thresholding with three basic levels
("all", "medium", "strict") and ii) the data columns that
are exported, again in three basic ways ("basic",
"normal", "full") controlling the amount of data to be
exported. These keywords can be combined with a dot in
the middle (e.g. "all.basic"
to define an analysis
preset. When using analysis presets, the following
argumentsof metaseqr are overriden: exon.filters
,
gene.filters
, pcut
, export.what
,
export.scale
, export.values
,
exon.stats
. If you want to explicitly control the
above arguments, the preset
argument should be set
to NULL
(default). Following is a synopsis of the
different presets and the values of the arguments they
moderate:
"all.basic"
: use all
genes (do not filter) and export all genes and basic
annotation and statistics elements. In this case, the
above described arguments become:
exon.filters=NULL
gene.filters=NULL
pcut=1
export.what=c("annotation","p.value","adj.p.value","meta.p.value",
"adj.meta.p.value","fold.change")
export.scale=c("natural","log2")
export.values=c("normalized")
export.stats=c("mean")
"all.normal"
: use all genes (do not filter) and
export all genes and normal annotation and statistics
elements. In this case, the above described arguments
become:
exon.filters=NULL
gene.filters=NULL
pcut=1
export.what=c("annotation","p.value","adj.p.value","meta.p.value",
"adj.meta.p.value","fold.change","stats","counts")
export.scale=c("natural","log2")
export.values=c("normalized")
export.stats=c("mean","sd","cv")
In this case, the above described arguments become:
exon.filters=NULL
gene.filters=NULL
pcut=1
export.what=c("annotation","p.value","adj.p.value","meta.p.value",
"adj.meta.p.value","fold.change","stats","counts")
export.scale=c("natural","log2","log10","vst")
export.values=c("raw","normalized")
export.stats=c("mean","median","sd","mad","cv","rcv")
"medium.basic"
: apply a medium set of
filters and and export statistically significant genes
and basic annotation and statistics elements. In this
case, the above described arguments become:
exon.filters=list(min.active.exons=list(exons.per.gene=5,min.exons=2,frac=1/5))
gene.filters=list(length=list(length=500),
avg.reads=list(average.per.bp=100,quantile=0.25),
expression=list(median=TRUE,mean=FALSE,quantile=NA,known=NA,custom=NA),
biotype=get.defaults("biotype.filter",org[1]))
pcut=0.05
export.what=c("annotation","p.value","adj.p.value","meta.p.value",
"adj.meta.p.value","fold.change")
export.scale=c("natural","log2")
export.values=c("normalized")
export.stats=c("mean")
"medium.normal"
: apply a medium set of filters and
and export statistically significant genes and normal
annotation and statistics elements. In this case, the
above described arguments become:
exon.filters=list(min.active.exons=list(exons.per.gene=5,min.exons=2,frac=1/5))
gene.filters=list(length=list(length=500),
avg.reads=list(average.per.bp=100,quantile=0.25),
expression=list(median=TRUE,mean=FALSE,quantile=NA,known=NA,custom=NA),
biotype=get.defaults("biotype.filter",org[1]))
pcut=0.05
export.what=c("annotation","p.value","adj.p.value","meta.p.value",
"adj.meta.p.value","fold.change","stats","counts")
export.scale=c("natural","log2")
export.values=c("normalized")
export.stats=c("mean","sd","cv")
and statistics elements. In this case, the above described arguments become:
exon.filters=list(min.active.exons=list(exons.per.gene=5,min.exons=2,frac=1/5))
gene.filters=list(length=list(length=500),
avg.reads=list(average.per.bp=100,quantile=0.25),
expression=list(median=TRUE,mean=FALSE,quantile=NA,known=NA,custom=NA),
biotype=get.defaults("biotype.filter",org[1]))
pcut=0.05
export.what=c("annotation","p.value","adj.p.value","meta.p.value",
"adj.meta.p.value","fold.change","stats","counts")
export.scale=c("natural","log2","log10","vst")
export.values=c("raw","normalized")
export.stats=c("mean","median","sd","mad","cv","rcv")
"strict.basic"
: apply a strict set of
filters and and export statistically significant genes
and basic annotation and statistics elements. In this
case, the above described arguments become:
exon.filters=list(min.active.exons=list(exons.per.gene=4,min.exons=2,frac=1/4))
gene.filters=list(length=list(length=750),
avg.reads=list(average.per.bp=100,quantile=0.5),
expression=list(median=TRUE,mean=FALSE,quantile=NA,known=NA,custom=NA),
biotype=get.defaults("biotype.filter",org[1]))
pcut=0.01
export.what=c("annotation","p.value","adj.p.value","meta.p.value",
"adj.meta.p.value","fold.change")
export.scale=c("natural","log2")
export.values=c("normalized")
export.stats=c("mean")
"strict.normal"
: apply a strict set of filters and
and export statistically significant genes and normal
annotation and statistics elements. In this case, the
above described arguments become:
exon.filters=list(min.active.exons=list(exons.per.gene=4,min.exons=2,frac=1/4))
gene.filters=list(length=list(length=750),
avg.reads=list(average.per.bp=100,quantile=0.5),
expression=list(median=TRUE,mean=FALSE,quantile=NA,known=NA,custom=NA),
biotype=get.defaults("biotype.filter",org[1]))
pcut=0.01
export.what=c("annotation","p.value","adj.p.value","meta.p.value",
"adj.meta.p.value","fold.change","stats","counts")
export.scale=c("natural","log2")
export.values=c("normalized")
export.stats=c("mean","sd","cv")
and statistics elements. In this case, the above described arguments become:
exon.filters=list(min.active.exons=list(exons.per.gene=4,min.exons=2,frac=1/4))
gene.filters=list(length=list(length=750),
avg.reads=list(average.per.bp=100,quantile=0.5),
expression=list(median=TRUE,mean=FALSE,quantile=NA,known=NA,custom=NA),
biotype=get.defaults("biotype.filter",org[1]))
pcut=0.01
export.what=c("annotation","p.value","adj.p.value","meta.p.value",
"adj.meta.p.value","fold.change","stats","counts")
export.scale=c("natural","log2","log10","vst")
export.values=c("raw","normalized")
export.stats=c("mean","median","sd","mad","cv","rcv")
Please note that currently only gene and exon annotation
from Ensembl (http://www.ensembl.org), UCSC and RefSeq
are supported. Thus, the unique gene or exon ids in the
counts files should correspond to valid Ensembl, UCSC or
RefSeq gene or exon accessions for the organism of interest.
If you are not sure about the source of your counts file or
do not know how to produce it, it's better to start from the
original BAM/BED files (metaseqr will use the
read2count
function to create a counts
file). Keep in mind that in the case of BED files, the
performance will be significantly lower and the overall
running time significanlty higher as the R functions
which are used to read BED files to proper structures
(GenomicRanges) and calculate the counts are quite slow.
An alternative way is maybe the easyRNASeq package
(Delhomme et al, 2012). The read2count
function does not use this package but rather makes use
of standard Bioconductor functions to handle NGS data. If
you wish to work outside R, you can work with other
popular read counters such as the HTSeq read counter
(http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html).
Please also note that in the current version, the members
of the gene.filters
and exon.filters
lists
are not checked for validity so be careful to supply with
correct names otherwise the pipeline will crash or at the
best case scenario, will ignore the filters. Also note
that when you are supplying metaseqr wtih an exon counts
table, gene annotation is always downloaded so please be
sure to have a working internet connection. In addition
to the above, if you have a multiple core system, be very
careful on how you are using the restrict.cores
argument and generally how many cores you are using with
scripts purely written in R. The analysis with exon read
data can very easily cause memory problems, so unless you
have more than 64Gb of RAM available, consider setting
restrict.cores to something like 0.2 when working with
exon data. Finally, if you do not wish to download the
same annotation again and again when performing multiple
analyses, it is best to use the
get.annotation
function to download and
store the resulting data frames in local files and then
use these files with the annotation
option.
Please note that the meta-analysis feature provided by metaseqr does not satisfy the strict definition of "meta-analysis", which is the combination of multiple similar datasets under the same statistical methodology. Instead it is the use of mulitple statistical tests applied to the same data. For the Simes method, please consult also "Simes, R. J. (1986). "An improved Bonferroni procedure for multiple tests of significance". Biometrika 73 (3): 751–754."
Panagiotis Moulos
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | # An example pipeline with exon counts
data("hg19.exon.data",package="metaseqR")
metaseqr(
counts=hg19.exon.counts,
sample.list=list(normal="normal",paracancerous="paracancerous",cancerous="cancerous"),
contrast=c("normal_vs_paracancerous","normal_vs_cancerous",
"normal_vs_paracancerous_vs_cancerous"),
libsize.list=libsize.list.hg19,
id.col=4,
annotation="download",
org="hg19",
count.type="exon",
normalization="edaseq",
statistics="deseq",
pcut=0.05,
qc.plots=c("mds", "biodetection", "countsbio", "saturation", "rnacomp",
"boxplot", "gcbias", "lengthbias", "meandiff", "readnoise","meanvar",
"readnoise", "deheatmap", "volcano", "biodist", "filtered"),
fig.format=c("png","pdf"),
export.what=c("annotation","p.value","adj.p.value","fold.change","stats",
"counts"),
export.scale=c("natural","log2","log10","vst"),
export.values=c("raw","normalized"),
export.stats=c("mean","median","sd","mad","cv","rcv"),
restrict.cores=0.8,
gene.filters=list(
length=list(
length=500
),
avg.reads=list(
average.per.bp=100,
quantile=0.25
),
expression=list(
median=TRUE,
mean=FALSE
),
biotype=get.defaults("biotype.filter","hg18")
)
)
# An example pipeline with gene counts
data("mm9.gene.data",package="metaseqR")
result <- metaseqr(
counts=mm9.gene.counts,
sample.list=list(e14.5=c("e14.5_1","e14.5_2"), adult_8_weeks=c("a8w_1","a8w_2")),
contrast=c("e14.5_vs_adult_8_weeks"),
libsize.list=libsize.list.mm9,
annotation="download",
org="mm9",
count.type="gene",
normalization="edger",
statistics=c("deseq","edger","noiseq"),
meta.p="fisher",
pcut=0.05,
fig.format=c("png","pdf"),
export.what=c("annotation","p.value","meta.p.value","adj.meta.p.value",
"fold.change"),
export.scale=c("natural","log2"),
export.values="normalized",
export.stats=c("mean","sd","cv"),
export.where=getwd(),
restrict.cores=0.8,
gene.filters=list(
length=list(
length=500
),
avg.reads=list(
average.per.bp=100,
quantile=0.25
),
expression=list(
median=TRUE,
mean=FALSE,
quantile=NA,
known=NA,
custom=NA
),
biotype=get.defaults("biotype.filter","mm9")
),
out.list=TRUE
)
head(result$data[["e14.5_vs_adult_8_weeks"]])
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.