anota2seqResidOutlierTest: Test for normality of residuals

Description Usage Arguments Details Value References See Also Examples

View source: R/anota2seqResidOutlierTest.R

Description

One assumption when performing APV is that the residuals from the regressions are normally distributed. anota2seq assesses this by comparing the Q-Q plots of the residuals to envelopes derived by sampling from the normal distribution.

Usage

1
2
3
anota2seqResidOutlierTest(Anota2seqDataSet, confInt = 0.01,
  iter = 5, generateSingleGenePlots = FALSE, nGraphs = 200,
  generateSummaryPlot = TRUE, residFitPlot = TRUE, useProgBar = TRUE)

Arguments

Anota2seqDataSet

An object of class Anota2seqDataSet that also contains the output of the anota2seqPerformQC function.

confInt

Controls how many samples from the normal distribution will be used to generate the envelope to which the residuals are compared. Default is 0.01 which will generate 99 samples from the normal distribution to compare to the actual residuals.

iter

How many times should the analysis be performed? Default is 5 meaning that 5 sets of samples (each with the size controlled by confInt) will be generated. Notice that the summary plotting is only performed for the last set but the percentage of outliers for each iteration can be found in the output object.

generateSingleGenePlots

The analysis is performed per identifier and plots can be generated for each identifier. However, due to the high number of identifiers, a large number of plots will typically be generated. TRUE/FALSE with default FALSE.

nGraphs

If generateSingleGenePlots is set to TRUE, nGraphs controls for how many identifiers such single gene graphs will be generated. Default is 200. NOTE: this parameter plots the top "n" genes in the same order as the input data.

generateSummaryPlot

The function can generate a summary graph that shows the envelopes generated by sampling from the normal distribution compared to the obtained values for all genes. Default is TRUE, thus the graph is generated but only from the last iteration.

residFitPlot

Generates an output of the fitted values and residuals. Default is TRUE, generate the plot.

useProgBar

Should the progress bar be shown. Default is TRUE, show progress bar.

Details

The anota2seqResidOutlierTest function assesses whether the residuals from the per identifier linear regressions of translated mRNA level~total mRNA level+treatment are normally distributed. anota2seq generates normal Q-Q plots of the residuals. If the residuals are normally distributed, the data quantiles will form a straight diagonal line from bottom left to top right. Because there are typically relatively few data points, anota2seq calculates "envelopes" based on a set of samplings from the normal distribution using the same number of data points as for the true data Venables,Ripley.To enable a comparison both the actual and the sampled data are centered (mean=0) and scaled (sd=1). The data (both true and sampled) are then sorted and the true sample is compared to the envelopes of the sampled data at each sort position. The result is presented as a Q-Q plot of the true data where the envelopes of the sampled data are indicated. If there are 99 samplings we expect that 1/100 values to be outside the envelopes obtained from the samplings. Thus it is possible to assess if approximately the expected number of outlier residuals are obtained. The result is presented as both a graphical output and an output object.

Value

An Anota2seqDataSet. anota2seqResidOutlierTest saves its output data in the 'residOutlierTest' slot of the Anota2seqDataSet, see anota2seqGetResidOutlierTest for a detailed description of this output.

anota2seqResidOutlierTest also generates a graphical output ("ANOTA2SEQ_residual_distribution_summary.pdf") showing the Q-Q plots from all genes as well as the envelopes from the sampled data. The obtained percentage of outliers is shown at each rank position and all combined. Optionally, when generateSingleGenePlots is set to TRUE, the function also generates individual plots (stored as "ANOTA2SEQ_residual_distributions_single.pdf") for n genes (set by nGraphs). When residFitPlot is set to TRUE an output comparing the fitted values to the residuals is generated (stored as "ANOTA2SEQ_residuals_vs_fitted.jpeg").

References

Venables, W.N. and Ripley, B.D., Modern Applied Statistics with S-PLUS, springer (1999).

See Also

anota2seqGetResidOutlierTest

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
data(anota2seq_data)
# Initialize Anota2seqDataSet
Anota2seqDataSet <- anota2seqDataSetFromMatrix(
    dataP = anota2seq_data_P[1:100,],
    dataT = anota2seq_data_T[1:100,],
    phenoVec = anota2seq_pheno_vec,
    dataType = "RNAseq",
    normalize = TRUE)
# Perform anota2seqPerformQC function. This must be performed prior the running
# the anota2seqResidualOutlierTest function.
Anota2seqDataSet <- anota2seqPerformQC(Anota2seqDataSet)
# Perform anota2seqResidualOutlierTest function
Anota2seqDataSet <- anota2seqResidOutlierTest(Anota2seqDataSet)

## End(Not run)

anota2seq documentation built on Nov. 8, 2020, 6 p.m.