Functions used to create an Anota2seqDataSet S4 object from user
input. This object will be used to collect data and results from all steps of
the anota2seq workflow and is initiated using one of the 2 available
constructors: anota2seqDataSetFromMatrix (use when the input data is
provided as a set of custom vectors and matrices) or
anota2seqDataSetFromSE (use when the input is of the class
SummarizedExperiment).
This parameter is used if selecting to initiate the
Anota2seqDataSet using custom vectors and matrices, and the
anota2seqDataSetFromMatrix function. A matrix containing data
for translated mRNA (e.g. polysome-associated mRNA or RPF). Rows must
correspond to identifiers (only non-numerical row names are allowed)
and columns to samples.
dataT
This parameter is used if selecting to initiate the
Anota2seqDataSet using custom vectors and matrices, and the
anota2seqDataSetFromMatrix function. A matrix containing data
for total mRNA. Rows must correspond to identifiers (only non-numerical
row names are allowed) and columns to samples.
phenoVec
This parameter is used if selecting to initiate the
Anota2seqDataSet using custom vectors and matrices, and the
anota2seqDataSetFromMatrix function. A vector describing the
treatments (each treatment should have a unique identifier). Note that
dataT, dataP and phenoVec have to have the same sample order such that
e.g. column 1 in dataP is the translated mRNA for a sample,
column 1 in dataT is the total mRNA data and position 1 in phenoVec
describes the sample treatment.
batchVec
This parameter is used if selecting to initiate the
Anota2seqDataSet using custom vectors and matrices, and the
anota2seqDataSetFromMatrix function. A vector describing
annotation of the samples according to their batch (each batch should
have a unique identifier). Note that dataT, dataP and batchVec have to
have the same sample order such that e.g. column 1 in dataP is the
polysome association data for a sample, column 1 in dataT is the total
mRNA data and position 1 in batchVec describes the batch identity.
dataType
This parameter is used when selecting to initiate the
Anota2seqDataSet using the anota2seqDataSetFromMatrix or the
anota2seqDataSetFromSE functions. Specify the platform on
which data were acquired. Can be set to either "microarray" (i.e.
already on a continuous log transformed scale) or "RNAseq" (i.e. count
data). For pre-normalized RNAseq data that is on a continuous log
scale, "microarray" or "RNAseq" in combination with setting the
parameter "normalize" to FALSE could be used.
normalize
This parameter is used when selecting to initiate the
Anota2seqDataSet using the anota2seqDataSetFromMatrix or the
anota2seqDataSetFromSE functions. Boolean (TRUE/FALSE) that
defaults to TRUE. If TRUE, RNAseq data (or other count data) will be
normalized and transformed according to the specified transformation.
Microarray data should be normalized by the user before using it as input of
anota2seqDataSetFromMatrix or anota2seqDataSetFromSE
transformation
This parameter is used when selecting to initiate
the Anota2seqDataSet using the anota2seqDataSetFromMatrix or the
anota2seqDataSetFromSE functions. Selection of method for
normalization. Must be a vector containing "rlog" or "TMM-log2" that is
considered only when dataType = "RNAseq" and normalize = TRUE. The
default is "TMM-log2". When using "TMM-log2", RNAseq data will be
normalized using the TMM normalization prior to log2 counts per million
computation using the voom function of the limma package.
filterZeroGenes
This parameter is used when selecting to initiate
the Anota2seqDataSet using the anota2seqDataSetFromMatrix or the
anota2seqDataSetFromSE functions. Boolean (TRUE/FALSE); if set to
TRUE, genes with 0 counts in at least 1 sample will be removed prior to
normalization.
varCutOff
This parameter is a numeric value (or NULL) used when
selecting to initiate the Anota2seqDataSet using the
anota2seqDataSetFromMatrix or the anota2seqDataSetFromSE
functions. This parameter indicates if and by which threshold variance
filtering should be applied. The default is NULL, i.e. no filtering
based on variance. If a cut off is applied, filtering will be performed
by applying the threshold to the result of the var() function.
Filtering is performed per mRNA source (i.e. translated mRNA and total
mRNA) and treatment. This parameter can be used to avoid a rare error
during anota2seq analysis (see details).
se
This parameter is used if selecting to initiate the
Anota2seqDataSet using a SummarizedExperiment object with the
anota2seqDataSetFromSE function. Within the SummarizedExperiment
object the expression data is supplied as one assay containing data for
both translated mRNA (e.g. polysome-associated mRNA or RPFs) and total
mRNA (rows correspond to identifiers and columns to samples). The
annotation needed is supplied within "colData" of the
SummarizedExperiment object (rows correspond to samples with identical
names as in the assay while columns correspond to various annotation).
The "colData" must contain the following annotation columns with their
names within quotes:
"RNA": under this column each
sample must be annotated with one out of two RNA source identifiers:
"P" indicates that the sample was obtained from translated mRNA (e.g.
polysome-associated mRNA or RPFs) whereas "T" indicates that the sample
was obtained from total mRNA.
"treatment": under this column the
treatment for each sample is indicated. Samples with the same treatment
must have identical identifiers.
"samplePairs": under this column
the sample pair identity is indicated. This serves to identify pairs of
data for translated mRNA (i.e. "P" under the "RNA" column) and total
mRNA (i.e. "T" under the "RNA" column) that were derived from the same
starting sample. Each pair of "P" and "T" must have a common identifier
that that is unique for that pair (i.e. is not used by any other pair
of "P" and "T"). This column will also be used to order columns of translated
and total RNA data.
"batch": under this optional column the batch
identity of each sample is indicated (depending on whether the
downstream analysis will include a batch parameter or not). A common
batch used in downstream analysis is replicate but any other batch that
does not overlap with analyzed treatments can be used. Each batch must
be indicated by a unique identifier (i.e. not used by any other batch).
assayNum
This parameter is used if selecting to initiate the
Anota2seqDataSet using a SummarizedExperiment object and the
anota2seqDataSetFromSE function and should specify the assay position
(retrieved by "assays(se)") containing the expression data for
analysis. By default, the first assay will be used.
Details
These functions initiate an Anota2seqDataSet and provide
possibilities to filter, transform and normalize the data. The input
can be either of the SummarizedExperiment class including the
annotation as outlined above or as a set of matrices of vectors that
together contain the same information.
If raw RNAseq data (or other count data) is provided, gene filtering
for genes with 0 counts in at least one sample (optional) can be
performed followed by normalization and transformation. Transformation
algorithms that are available are rlog (DESeq2 package) and TMM-log2
(TMM normalization using the edgeR package followed by log2 counts per
million computation using the voom function of the limma package). The
relative performance of these methods have been described elsewhere.
A rare error can occur when data within translated mRNA
(polysome-associated mRNA or RPF) or total mRNA data from any gene and
any treatment has no variance. Users can use the varCutOff parameter to
perform filtering based on variance per mRNA source (i.e. polysome-
associated mRNA (RPFs) or total mRNA) and treatment. This will
eliminate this error which is due to that statistics cannot be
calculated in the absence of variance.
Value
an Anota2seqDataSet containing data and covariates ready for
analysis using anota2seqAnalyze or
anota2seqRun.