anota2seqPerformQC: Perform quality control to ensure that the supplied data set...

Description Usage Arguments Details Value See Also Examples

View source: R/anota2seqPerformQC.R


Generates a distribution of interaction p-values which are compared to the expected NULL distribution. Also assesses the frequency of highly influential data points using dfbetas for the regression slope and compares the dfbetas to randomly generated simulation data. Calculates omnibus treatment effects.


anota2seqPerformQC(Anota2seqDataSet, generateSingleGenePlots = FALSE,
  fileName = "ANOTA2SEQ_translation_vs_mRNA_individual_regressions.pdf",
  nReg = 200, correctionMethod = "BH", useDfb = TRUE, useDfbSim = TRUE,
  nDfbSimData = 2000, useRVM = TRUE, onlyGroup = FALSE,
  useProgBar = TRUE, fileStem = "ANOTA2SEQ")



An object of class Anota2seqDataSet.


anota2seq can plot the regression for each gene. However, as there are many genes, this output is normally not informative. TRUE/FALSE with default FALSE, no individual plotting.


If generateSingleGenePlots is set to TRUE use file to set desired file name (prints to current directory as a pdf). Default is "ANOTA2SEQ_translation_vs_mRNA_individual_regressions.pdf "


If generateSingleGenePlots is set to TRUE, nReg can be used to limit the number of output plots. Default is 200. NOTE: this parameter plots the top "n" genes in the same order as the input data.


anota2seq adjusts the omnibus interaction and treatment p-values for multiple testing. Correction method can be "Bonferroni", "Holm", "Hochberg", "SidakSS", "SidakSD", "BH", "BY", "ABH" or "TSBH" as implemented in the multtest package or "qvalue" as implemented in the qvalue package. Default is "BH".


Should anota2seq assess the occurrence of highly influential data points (TRUE/FALSE with default TRUE).


The random occurrence of dfbetas can be simulated. Default is TRUE. FALSE represses simulation which reduces computation time but makes interpretation of the dfbetas difficult.


If useDfbSim is TRUE the user can select the number of sampling that will be performed per step (10 steps with different correlations between the polysome association and the total mRNA level). Default is 2000.


The Random Variance Model (RVM) can be used for the omnibus treatment analysis. In this case the effect of RVM on the distribution of the interaction significances needs to be tested as well. TRUE/FALSE where default (TRUE) leads to calculation of RVM p-values for both omnibus interactions and omnibus treatment effects.


It is possible to suppress the omnibus interaction analysis and only perform the omnibus treatment analysis. TRUE/FALSE with default FALSE (analyze both interactions and treatment effects.)


Should the progress bar be shown. TRUE/FALSE with default TRUE, show progress bar.


This stem will be added in front of each output filename. Default is "ANOTA2SEQ".


The anota2seqPerformQC performs the basic quality control of the data set. Two levels of quality control are assessed, both of which need to show good performance for valid application of anota2seq. First, anota2seq assumes that there are no interactions (for slopes). The output for this analysis is both a density plot and a histogram plot of both the raw p-values and the p-values adjusted by the selected multiple correction method (if RVM was used, the second page shows the same presentation using RMV p-values). anota2seq requires a uniform distribution of the raw interaction p-values for valid analysis of changes in translational efficiency affecting protein levels and buffering. anota2seq also assesses if there are more data points with high influence on the regression analyzes than would be expected by chance. anota2seq identifies influential data points as data points that influence the slope of the regression using standardized dfbeta (dfbetas). The function also performs an omnibus treatment effect test if there are more than 2 treatments. It is possible to use RVM for the omnibus treatment statistics. If RVM is used, it is necessary to verify that the interaction RVM p-values also follow the expected NULL distribution.


An Anota2seqDataSet. anota2seqPerformQC saves its output data in the 'qualityControl' slot of the Anota2seqDataSet, see anota2seqGetQualityControl for a detailed description of this output.

anota2seqPerformQC also generates several graphical outputs. One output ("ANOTA2SEQ_interaction_p_distribution.pdf") shows the distribution of p-values and adjusted p-values for the omnibus interaction (both using densities and histograms). The second page of the pdf displays the same plots but for the RVM statistics if RVM is used. One output ("ANOTA2SEQ_simulated_vs_obtained_dfbs.pdf") shows bar graphs of the frequencies of outlier dfbetas using different dfbetas thresholds. If the simulation was enabled (recommended) these are compared to the frequencies from the random data set. One optional graphical output shows the gene by gene regressions with the sample classes indicated. In the case where RVM is used, a Q-Q plot and a comparison of the CDF of the variances to the theoretical CDF of the F-distribution is generated (output as "ANOTA2SEQ_rvm_fit_for_....jpg") for both the omnibus sample class and the omnibus interaction test.

See Also



## Not run: 
Anota2seqDataSet <- anota2seqDataSetFromMatrix(dataP = anota2seq_data_P[1:100,],
                                      dataT = anota2seq_data_T[1:100,],
                                      phenoVec = anota2seq_pheno_vec,
                                      dataType = "RNAseq",
                                      normalize = TRUE)

Anota2seqDataSet <- anota2seqPerformQC(Anota2seqDataSet)

## End(Not run)

anota2seq documentation built on Nov. 8, 2020, 6 p.m.