msDenoiseWavelet: Denoising a Mass Spectrum via Wavelet Shrinkage
In zeehio/msProcess: Protein Mass Spectra Processing

Description Usage Arguments Details Value References See Also Examples

Performs a decimated or undecimated discrete wavelet transform on the input series and "shrinks" (decreases the amplitude towards zero) the wavelet coefficients based on a calculated noise threshold and specified shrinkage function. The resulting shrunken set of wavelet transform coefficients is inverted in a synthesis operation, resulting in a denoised version of the original series.

msDenoiseWavelet(x, wavelet="s8",
    n.level=as.integer(floor(logb(length(x), 2))),
    shrink.fun="hard",
    thresh.fun="universal", thresh.scale=1,
    xform="modwt", noise.variance=NULL,
    reflect=TRUE, process="msDenoiseWavelet",
    assign.attributes=FALSE)

`x`	A vector containing a uniformly-sampled real-valued time series.
`assign.attributes`	A logical value. If `TRUE`, the argument values to the function call will be attached as attributes to the output vector. Default: `FALSE`.
`n.level`	The number of decomposition levels, limited to `floor(logb(length(x),2))`. Default:`as.integer(floor(logb(length(x), 2)))`.
`noise.variance`	A numeric scalar representing (an estimate of) the additive Gaussian white noise variance. If unknown, setting this value to 0.0 (or less) will prompt the function to automatically estimate the noise variance based on the median absolute deviation (MAD) of the scale one wavelet coefficients. Default: NA (MAD estimate will be used).
`process`	A character string denoting the name of the process to register with the (embedded) event history object of the input after processing the input data. This process is not updated if it already exists in the event history. Default: `"msDenoiseWavelet"`.
`reflect`	A logical value. If `TRUE`, the last Lj = (2^n.level - 1)(L - 1) + 1 coefficients of the series are reflected (reversed and appended to the end of the series) in order to attenuate the adverse effect of circular filter operations on wavelet transform coefficients for series whose endpoint levels are (highly) mismatched. The variable Lj represents the effective filter length at decomposition level `n.level`, where L is the length of the wavelet (or scaling) filter. After waveshrinking and reconstructing, the first N points of the result are returned, where N is the length of the original time series. Default: `TRUE`.
`shrink.fun`	A character string denoting the shrinkage function. Choices are `"hard"`, `"soft"`, and `"mid"`. Default: `"hard"`.
`thresh.fun`	A character string denoting the threshold function to use in calculating the waveshrink thresholds. character string Choices are `"universal"`, `"minimax"`, and `"adaptive"`. numeric values Either a single threshold value or a vector of values containing `n.levels` thresholds (one threshold per decomposition level). Note: if `xform == "modwt"`, then only the `"universal"` threshold function is (currently) supported. Default: `"universal"`.
`thresh.scale`	A positive valued numeric scalar which is used to amplify or attenuate the threshold values at each decomposition level. The use of this argument signifies a departure from a model driven estimate of the thresholds and can be used to tweak the levels to obtain a smoother or rougher result. Default: 1.
`wavelet`	A character string denoting the filter type. See `wavDaubechies` for details. Default: "s8".
`xform`	A character string denoting the wavelet transform type. Choices are `"dwt"` and `"modwt"` for the discrete wavelet transform (DWT) and maximal overlap DWT (MODWT), respectively. The DWT is a decimated transform where (at each level) the number of transform coefficients is halved. Given N is the length of the original time series, the total number of DWT transform coefficients is N. The MODWT is a non-decimated transform where the number of coefficients at each level is N and the total number of transform coefficients is N*`n.level`. Unlike the DWT, the MODWT is shift-invariant and is seen as a weighted average of all possible non-redundant shifts of the DWT. See the references for details. Default: `"modwt"`.

Assume that an appropriate model for our time series is X=D + e where D represents an unknown deterministic signal of interest and e is some undesired stochastic noise that is independent and identically distributed and has a process mean of zero. Waveshrink seeks to eliminate the noise component e of X in hopes of obtaining (a close approximation to) D. The basic algorithm works as follows:

1: Calculate the DWT of X.
2: Shrink (reduce towards zero) the wavelet coefficients based on a selected thresholding scheme.
3: Invert the DWT.

This function support different shrinkage methods and threshold estimation schemes. Let W represent an arbitrary DWT coefficient and W' the correpsonding thresholded coefficient using a threshold of delta. The supported shrinkage methods are

hard thresholding: W'=0 if |W| <= delta; W otherwise.
soft thresholding: W'=sign{W}*f(|W| - delta) where f(x)=x if x >= 0; 0 otherwise and sign(x)=+1 if x > 0; 0 if x=0; -1 if x < 0.
mid thresholding: W'=sign{W}*g(|W| - delta) where g(W)=2*f(|W|-delta) if |W| < 2*delta; |W| otherwise.

Hard thresholding reduces to zero all coefficients that do not exceed the threshold. Soft thresholding pushes toward zero any coefficient whose magnitude exceeds the threshold, and zeros the coefficient otherwise. Mid thresholding represents a compromise between hard and soft thresholding such that coefficients whose magnitude exceeds twice the threshold are not adjusted, those between the threshold and twice the trhreshold are shrunk, and those below the threshold are zeroed.

The supported threshold functions are

universal

The universal threshold function is dependent on the type of wavelet transform used to decompose the time series.

DWT transform: delta=sqrt(2*var{noise}*log(N)) where var{noise} is the noise variance and N is the number of samples in the time series. If the optional input argument noise.variance is non-positive, it signifies that the additive noise variance is unknown and (in this case) the standard deviation of the noise is estimated by

sigma(noise) = median(|W(1,t)|)/ 0.6745

where the W(1,t) are the set of level 1 DWT wavelet coefficients. The MAD estimate is normalized by 0.6745 to ensure to return the proper result if the input series were solely comprised of Gaussian white noise.

MODWT transform: delta(j)=sqrt(2*var{noise}*log(N)/2^j) where var{noise} is the noise variance, N is the number of samples in the time series, and j is the decomposition level. Note that, unlike the DWT case, the threshold levels for the MODWT are a function of decomposition level. If the optional input argument noise.variance is non-positive, it signifies that the additive noise variance is unknown and (in this case) the standard deviation of the noise is estimated by

sigma(noise) = median(|W(1,t)|)/ 0.6745

where the W(1,t) are the set of level 1 MODWT wavelet coefficients. The MAD estimate is normalized by 0.6745 to ensure to return the proper result if the input series were solely comprised of Gaussian white noise.

In either case, the universal threshold is defined so that if the original time series was solely comprised of Gaussian noise, then all the wavelet coefficients would be (correctly) set to zero using a hard thresholding scheme. Inasmuch, the universal threshold results in highly smoothed output.

minimax

These thresholds are used with soft and hard thresholding, and are precomputed based on a minimization of a theoretical upperbound on the asymptotic risk. The minimax thresholds are always smaller than the universal threshold for a given sample size, thus resulting in relatively less smoothing.

adaptive

These are scale-adaptive thresholds, based on the minimization of Stein's Unbiased Risk Estimator for each level of the DWT. This method is only available with soft shrinkage. As a caveat, this threshold can produce poor results if the data is too sparse (see the references for details).

Finally, the user has the choice of using either a decimated (standard) form of the discrete wavelet transform (DWT) or an undecimated version of the DWT (known as the Maximal Overlap DWT (MODWT)). Unlike the DWT, the MODWT is a (circular) shift-invariant transform so that a circular shift in the original time series produces an equivalent shift of the MODWT coefficients. In addition, the MODWT can be interpreted as a cycle-spun version of the DWT, which is achieved by averaging over all non-redundant DWTs of shifted versions of the original series. The result is a smoother version of the DWT at the cost of an increase in computational complexity (for an N-point series, the DWT requires O(N) multiplications while the MODWT requires O(N log2(N)) multiplications.

A vector containing the denoised series and optionally with the argument values of the function call detached.

Donoho, D. and Johnstone, I. Ideal Spatial Adaptation by Wavelet Shrinkage. Technical report, Department of Statistics, Stanford University, 1992.

Donoho, D. and Johnstone, I. Adapting to Unknown Smoothness via Wavelet Shrinkage. Technical report, Department of Statistics, Stanford University, 1992.

D. B. Percival and A. T. Walden, Wavelet Methods for Time Series Analysis, Cambridge University Press, 2000.

msDenoise, msDenoiseWaveletThreshold, msNoise, wavDaubechies, wavDWT, wavMODWT, msSmoothLoess, msSmoothSpline, msSmoothKsmooth, msSmoothSupsmu, msSmoothApprox, msDenoiseSmooth, eventHistory.

if (!exists("qcset")) data("qcset", package="msProcess")

## create plot layout
old.par <- par()
par(mfrow=c(3,2))

## grab portion of a mass spectrum and plot
x <- qcset$intensity[5000:7000,1]
plot(x,type="l");title("original")

## create noise and add it to the spectrum portion
sd.noise <- 2
xnoise <- x + rnorm(length(x), sd=sd.noise)
plot(xnoise,type="l")
title(paste("original + noise (sd=", round(sd.noise,3),")",sep=""))

## plot MODWT waveshrink results after scaling the
## estimated threshold values
for ( k in seq(0.5,2,length=4)){

    y <- msDenoiseWavelet(xnoise, wavelet="s8",
       shrink.fun="hard", thresh.fun="universal",
       thresh.scale=k, xform="modwt")
    plot(y,type="l")
    title(paste("WS: thresh.scale =", round(k,2)))
}

## DWT waveshrink using different threshold
## functions
plot(x,type="l")
title("original")
plot(xnoise,type="l")
title(paste("original + noise (sd=", round(sd.noise,3),")",sep=""))

thresh.funs <- c("universal", "minimax", "adaptive")
for (k in thresh.funs){
    plot(msDenoiseWavelet(xnoise, thresh.fun=k, xform="dwt"), type="l", ylab="y")
    title(paste("WS: thresh.fun =", k))
}

## restore original plot layout
par(old.par)