cochranTest | R Documentation |
Detects and removes replicate outliers in data series based on the Cochran C test for homogeneity in variance.
cochranTest(X, id, fun = 'sum', alpha = 0.05)
X |
a a numeric matrix (optionally a data frame that can be coerced to a numerical matrix). |
id |
factor of the replicate identifiers. |
fun |
function to aggregate data: 'sum' (default), 'mean', 'PC1' or 'PC2'. |
alpha |
p-value of the Cochran C test. |
The Cochran C test is test whether a single estimate of variance is significantly larger than a a group of variances. It can be computed as:
\mjdeqnRMSD = \sqrt\frac1n \sum_i=1^n (y_i - \ddoty_i)^2RMSD = sqrt1/n sum (y_i - ddoty_i)^2
where \mjeqny_iy_i is the value of the side variable of the \mjeqniith sample, \mjeqn\ddoty_i\ddoty_i is the value of the side variable of the nearest neighbor of the \mjeqniith sample and \mjeqnnn is the total number of observations.
For multivariate data, the variance \mjeqnS_i^2S_i^2 can be computed on aggregated
data, using a summary function (fun
argument)
such as sum
, mean
, or first principal components ('PC1' and 'PC2').
An observation is considered to have an outlying variance if the Cochran C statistic is higher than an upper limit critical value \mjeqnC_ULC_UL which can be evaluated with ('t Lam, 2010):
\mjdeqnC_UL(\alpha, n, N) = 1 + [\fracN-1F_c(\alpha/N,(n-1),(N-1)(n-1))]^-1 C_UL(\alpha, n, N) = 1 + [\fracN-1F_c(\alpha/N,(n-1),(N-1)(n-1))]^-1
where \mjeqn\alpha\alpha is the p-value of the test, \mjeqnnn is the (average) number of replicates and \mjeqnF_cF_c is the critical value of the Fisher's \mjeqnFF ratio.
The replicates with outlying variance are removed and the test can be applied
iteratively until no outlying variance is detected under the given p-value.
Such iterative procedure is implemented in cochranTest
, allowing the user
to specify whether a set of replicates must be removed or not from the
dataset by graphical inspection of the outlying replicates. The user has then
the possibility to (i) remove all replicates at once, (ii) remove one or more
replicates by giving their indices or (iii) remove nothing.
a list with components:
'X
': input matrix from which outlying observations (rows) have
been removed
'outliers
': numeric vector giving the row indices of the input
data that have been flagged as outliers
The test assumes a balanced design (i.e. data series have the same number of replicates).
Antoine Stevens
Centner, V., Massart, D.L., and De Noord, O.E., 1996. Detection of inhomogeneities in sets of NIR spectra. Analytica Chimica Acta 330, 1-17.
R.U.E. 't Lam (2010). Scrutiny of variance results for outliers: Cochran's test optimized. Analytica Chimica Acta 659, 68-84.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.