fitModelParameters: Peak parameter estimation

Description Usage Arguments Details Value Warning Warning See Also Examples

Description

In the template-based approach of this package, each template/peak pattern is composed of several single basic peaks. The shape of such a basic peak may be modeled either as Gaussian or as Exponentially Modified Gaussian (EMG). The second model assumes that the shape of each peak equals the shape of the function one obtains when convolving the probability density function of a Gaussian distribution with the probability density function of the exponential distribution. This is a more complex as well as more flexible model, since it allows one to account for skewness. Both peak models depend on a set of parameters which are usually unknown a priori. Moreover, these parameters tend to vary over the spectrum. The documented method provides the following functionality: given a raw spectrum and a linear model that describes how the set of parameters varies as a function of m/z, we detect well-resolved peaks within the spectrum. For each detected peak, we determine the parameters of our model in such a way that the resulting peak shape matches that of the detected peak as good as possible in a least-squares sense. In this manner, we obtain a set of estimated parameters for different m/z positions. These estimates are used as response, the m/z positions as explanatory variable in a linear regression model (not necessarily linear in m/z). To be resistant to outliers (e.g. as occurring due to overlapping peaks), we use least absolute deviation regression to infer the model parameters of these linear models. The result can directly be used for the argument model.parameters in the main method getPeakList.

Usage

1
2
3
4
5
6
7
fitModelParameters(mz, intensities, model = c("Gaussian", "EMG"),
                                           fitting = c("most_intense", "model"), 
                                           formula.alpha =  formula(~1),
                                           formula.sigma = formula(~1),
                                           formula.mu = formula(~1),
                                           control = list(window = 6,
                                           threshold = NULL, rlm.maxit = 20))

Arguments

mz

A numeric vector of m/z (mass/charge) values (in Thomson).

intensities

A numeric vector of intensities corresponding to mz.

model

Basic model for the shape of a single peak. Must be "Gaussian" or "EMG" (exponentially modified Gaussian). See details below.

fitting

A character specifying the mode of peak extraction and -fitting. If fitting = "most_intense", then the most intense peak is detected and the parameter(s) is (are) fitted using only this peak. Then, the resulting functions of the form parameter(mz) are all constant functions. If fitting = "model", as many peaks as possible satisfying the criteria of control are used to estimate linear models of the form parameter(mz) = beta_0 + beta_1 g_1(mz) + ... + beta_p g_p(mz), where the g's are fixed functions. Their specification is performed by specifying one-sided formulae according to the Wilkinson-Roger notation for linear models as in the function lm. The model formulae have to satisfy the following criteria:

  • The formula is one sided, i.e. no term appears on the left hand side of ~.

  • The right hand side consists only of functions in mz, and mz is the only variable that may be used. Product terms involving * are not admitted.

  • Important: Note that, for example ~ 1 + mz + sqrt(mz) is a valid formula in the sense that no error will occur, but it does not correspond to the linear model parameter(mz) = beta_0 + beta_1 mz + beta_2 sqrt(mz). The correct model formula instead reads ~ 1 + mz + I(sqrt(mz)), i.e. each function has to be bracketed by I().

formula.alpha

A one-sided formula describing the dependence of the EMG-parameter alpha as a function of m/z, s. fitting. The default assumes that the parameter is independent of m/z, in which case one obtains a function returning the median of the estimates obtained for different detected peaks. formula.alpha, formula.sigma and formula.mu are needed if and only if fitting = "model".

formula.sigma

Parameter used for both peak models. For further information, see formula.alpha

formula.mu

See formula.alpha. Used only if model = "EMG".

control

A list controlling peak detection. The parameter window refers to the minimal resolution of a peak. According to window, a sequence of intensities at adjacent m/z positions is considered as peak if and only if there are at least window m/z positions with increasing intensity followed by a second sequence of window m/z positions with decreasing intensity. If in addition threshold is specified, only peaks whose maximum intensity is equal to or greater than threshold is considered. Note: Usually, threshold is specified, since otherwise the maximum intensity among the complete spectrum minus some epsilon is taken as threshold. rlm.maxit allows to control the maximum number of iterations used to fit peaks to the data.

Details

Let the variable x represent m/z. Then model = "Gaussian" assumes that a single peak can be described as

gaussfun(x;sigma,mu) = exp(-(x - mu)^2/sigma)

The parameter mu is not considered as model parameter: in the computation of the resulting basis function matrix, mu is always set to a known m/z position where the leading peak of a peak pattern might be present.
Model = "EMG" assumes that a single peak can be described as

EMG(x;alpha,sigma,mu) = exp(sigma^2/(2 * alpha^2) + (mu - x)/alpha) (1 - Phi(sigma/alpha + (mu - x)/(sigma)))/alpha,

where Phi represents the cumulative density function of the standard Gaussian distribution. Alternatively, EMG(.;alpha,sigma,mu) can be expressed as
EMG(x;alpha,sigma,mu) = (phi ** gamma)(x),
where ** denotes convolution, phi is the density function of the Gaussian distribution with mean mu and standard deviation sigma and gamma is the density function of an exponential distribution with expectation alpha.
The parameters of EMG can be interpreted as follows.

alpha

The lower alpha, the more the shape of the peak resembles that of a Gaussian. Conversely, large values of alpha lead to long right tails.

sigma

Controls the width of the peak (together with alpha).

mu

A location parameter. Note that in general mu does not coincide with the mode of EMG. Therefore, if model = "EMG", all three parameters are estimated from detected peaks.

Moreover, the skewness of EMG is characterized by the ratio alpha/sigma.

Value

An object of class modelfit.

Warning

Parameter estimation by fitting detected peaks is possible only if single peaks are sufficiently well-resolved. A peak composed of, say, five (m/z, intensity) pairs, is inappropriate to infer three parameters.

Warning

The choice model = "EMG" in conjunction with fitting = "model" can be extremely slow (taking up to several minutes of computing time) if many peaks are detected and fitted. This is caused by a grid search over a grid of 10^6 different combinations of alpha, sigma and mu performed prior to nonlinear least squares estimation in order to find suitable starting values.

See Also

getPeaklist, modelfit

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
### load data
data(toyspectrum)
### estimate parameter sigma of a Gaussian model,
### assumed to be independent of m/z

simplegauss <- fitModelParameters(toyspectrum[,1],
             toyspectrum[,2],
             model = "Gaussian",
             fitting = c("model"),
             formula.sigma = formula(~1),
             control = list(window = 6, threshold = 1))

show(simplegauss)
visualize(simplegauss, type = "peak", xlab = "m/z", ylab = "intensity",
          main = "Gaussian fit")

### fit the model sigma(m/z) = beta_0 + beta_1 m/z + beta_2 m/z^2

gaussquadratic <- fitModelParameters(toyspectrum[,1],
             toyspectrum[,2],
             model = "Gaussian",
             fitting = "model",
             formula.sigma = formula(~mz + I(mz^2) ),
             control = list(window = 6, threshold = 1))

show(gaussquadratic)
visualize(gaussquadratic, type = "model", modelfit = TRUE)

### estimate parameters for EMG-shaped peaks

EMGlinear <- fitModelParameters(toyspectrum[,1],
             toyspectrum[,2],
             model = "EMG",
             fitting = "model",
             formula.alpha = formula(~mz),
             formula.sigma = formula(~mz),
             formula.mu = formula(~1),
             control = list(window = 6, threshold = 1))

show(EMGlinear)

visualize(EMGlinear, type = "peak", xlab = "m/z", ylab = "intensities",
          main = "EMG fit")

visualize(EMGlinear, type = "model", parameters = c("alpha", "sigma"), modelfit = TRUE)

IPPD documentation built on April 28, 2020, 6:58 p.m.