anota2seqDataSetFromMatrix: Anota2seqDataSet constructors.

Description Usage Arguments Details Value Examples

View source: R/anota2seqDataSetFromMatrix.R

Description

Functions used to create an Anota2seqDataSet S4 object from user input. This object will be used to collect data and results from all steps of the anota2seq workflow and is initiated using one of the 2 available constructors: anota2seqDataSetFromMatrix (use when the input data is provided as a set of custom vectors and matrices) or anota2seqDataSetFromSE (use when the input is of the class SummarizedExperiment).

Usage

1
2
3
4
5
6
7
8
9
anota2seqDataSetFromMatrix(dataP, dataT, phenoVec, batchVec = NULL, dataType, 
  normalize = FALSE, transformation = "TMM-log2", 
  filterZeroGenes = ifelse(dataType == "RNAseq" & normalize == TRUE, TRUE, FALSE), 
  varCutOff = NULL)

anota2seqDataSetFromSE(se, assayNum = 1, dataType, normalize = FALSE, 
  transformation = "TMM-log2", 
  filterZeroGenes = ifelse(dataType == "RNAseq" & normalize == TRUE, TRUE, FALSE),
  varCutOff = NULL)

Arguments

dataP

This parameter is used if selecting to initiate the Anota2seqDataSet using custom vectors and matrices, and the anota2seqDataSetFromMatrix function. A matrix containing data for translated mRNA (e.g. polysome-associated mRNA or RPF). Rows must correspond to identifiers (only non-numerical row names are allowed) and columns to samples.

dataT

This parameter is used if selecting to initiate the Anota2seqDataSet using custom vectors and matrices, and the anota2seqDataSetFromMatrix function. A matrix containing data for total mRNA. Rows must correspond to identifiers (only non-numerical row names are allowed) and columns to samples.

phenoVec

This parameter is used if selecting to initiate the Anota2seqDataSet using custom vectors and matrices, and the anota2seqDataSetFromMatrix function. A vector describing the treatments (each treatment should have a unique identifier). Note that dataT, dataP and phenoVec have to have the same sample order such that e.g. column 1 in dataP is the translated mRNA for a sample, column 1 in dataT is the total mRNA data and position 1 in phenoVec describes the sample treatment.

batchVec

This parameter is used if selecting to initiate the Anota2seqDataSet using custom vectors and matrices, and the anota2seqDataSetFromMatrix function. A vector describing annotation of the samples according to their batch (each batch should have a unique identifier). Note that dataT, dataP and batchVec have to have the same sample order such that e.g. column 1 in dataP is the polysome association data for a sample, column 1 in dataT is the total mRNA data and position 1 in batchVec describes the batch identity.

dataType

This parameter is used when selecting to initiate the Anota2seqDataSet using the anota2seqDataSetFromMatrix or the anota2seqDataSetFromSE functions. Specify the platform on which data were acquired. Can be set to either "microarray" (i.e. already on a continuous log transformed scale) or "RNAseq" (i.e. count data). For pre-normalized RNAseq data that is on a continuous log scale, "microarray" or "RNAseq" in combination with setting the parameter "normalize" to FALSE could be used.

normalize

This parameter is used when selecting to initiate the Anota2seqDataSet using the anota2seqDataSetFromMatrix or the anota2seqDataSetFromSE functions. Boolean (TRUE/FALSE) that defaults to TRUE. If TRUE, RNAseq data (or other count data) will be normalized and transformed according to the specified transformation. Microarray data should be normalized by the user before using it as input of anota2seqDataSetFromMatrix or anota2seqDataSetFromSE

transformation

This parameter is used when selecting to initiate the Anota2seqDataSet using the anota2seqDataSetFromMatrix or the anota2seqDataSetFromSE functions. Selection of method for normalization. Must be a vector containing "rlog" or "TMM-log2" that is considered only when dataType = "RNAseq" and normalize = TRUE. The default is "TMM-log2". When using "TMM-log2", RNAseq data will be normalized using the TMM normalization prior to log2 counts per million computation using the voom function of the limma package.

filterZeroGenes

This parameter is used when selecting to initiate the Anota2seqDataSet using the anota2seqDataSetFromMatrix or the anota2seqDataSetFromSE functions. Boolean (TRUE/FALSE); if set to TRUE, genes with 0 counts in at least 1 sample will be removed prior to normalization.

varCutOff

This parameter is a numeric value (or NULL) used when selecting to initiate the Anota2seqDataSet using the anota2seqDataSetFromMatrix or the anota2seqDataSetFromSE functions. This parameter indicates if and by which threshold variance filtering should be applied. The default is NULL, i.e. no filtering based on variance. If a cut off is applied, filtering will be performed by applying the threshold to the result of the var() function. Filtering is performed per mRNA source (i.e. translated mRNA and total mRNA) and treatment. This parameter can be used to avoid a rare error during anota2seq analysis (see details).

se

This parameter is used if selecting to initiate the Anota2seqDataSet using a SummarizedExperiment object with the anota2seqDataSetFromSE function. Within the SummarizedExperiment object the expression data is supplied as one assay containing data for both translated mRNA (e.g. polysome-associated mRNA or RPFs) and total mRNA (rows correspond to identifiers and columns to samples). The annotation needed is supplied within "colData" of the SummarizedExperiment object (rows correspond to samples with identical names as in the assay while columns correspond to various annotation). The "colData" must contain the following annotation columns with their names within quotes:

  • "RNA": under this column each sample must be annotated with one out of two RNA source identifiers: "P" indicates that the sample was obtained from translated mRNA (e.g. polysome-associated mRNA or RPFs) whereas "T" indicates that the sample was obtained from total mRNA.

  • "treatment": under this column the treatment for each sample is indicated. Samples with the same treatment must have identical identifiers.

  • "samplePairs": under this column the sample pair identity is indicated. This serves to identify pairs of data for translated mRNA (i.e. "P" under the "RNA" column) and total mRNA (i.e. "T" under the "RNA" column) that were derived from the same starting sample. Each pair of "P" and "T" must have a common identifier that that is unique for that pair (i.e. is not used by any other pair of "P" and "T"). This column will also be used to order columns of translated and total RNA data.

  • "batch": under this optional column the batch identity of each sample is indicated (depending on whether the downstream analysis will include a batch parameter or not). A common batch used in downstream analysis is replicate but any other batch that does not overlap with analyzed treatments can be used. Each batch must be indicated by a unique identifier (i.e. not used by any other batch).

assayNum

This parameter is used if selecting to initiate the Anota2seqDataSet using a SummarizedExperiment object and the anota2seqDataSetFromSE function and should specify the assay position (retrieved by "assays(se)") containing the expression data for analysis. By default, the first assay will be used.

Details

These functions initiate an Anota2seqDataSet and provide possibilities to filter, transform and normalize the data. The input can be either of the SummarizedExperiment class including the annotation as outlined above or as a set of matrices of vectors that together contain the same information.

If raw RNAseq data (or other count data) is provided, gene filtering for genes with 0 counts in at least one sample (optional) can be performed followed by normalization and transformation. Transformation algorithms that are available are rlog (DESeq2 package) and TMM-log2 (TMM normalization using the edgeR package followed by log2 counts per million computation using the voom function of the limma package). The relative performance of these methods have been described elsewhere.

A rare error can occur when data within translated mRNA (polysome-associated mRNA or RPF) or total mRNA data from any gene and any treatment has no variance. Users can use the varCutOff parameter to perform filtering based on variance per mRNA source (i.e. polysome- associated mRNA (RPFs) or total mRNA) and treatment. This will eliminate this error which is due to that statistics cannot be calculated in the absence of variance.

Value

an Anota2seqDataSet containing data and covariates ready for analysis using anota2seqAnalyze or anota2seqRun.

Examples

1
2
3
4
5
6
data(anota2seq_data)
Anota2seqDataSet <- anota2seqDataSetFromMatrix(dataP = anota2seq_data_P[1:500,],
                                      dataT = anota2seq_data_T[1:500,],
                                      phenoVec = anota2seq_pheno_vec,
                                      dataType = "RNAseq",
                                      normalize = TRUE)

ChrOertlin/anota2seq documentation built on Aug. 4, 2021, 2:17 p.m.