lmvar: Linear regression with non-constant variances

Description Usage Arguments Details Value See Also Examples

Description

Performs a Gaussian regression with non-constant variances and outputs an 'lmvar' object.

Usage

1
2
3
lmvar(y, X_mu = NULL, X_sigma = NULL, intercept_mu = TRUE,
  intercept_sigma = TRUE, sigma_min = 0, slvr_options = list(method =
  "NR"), control = list(), ...)

Arguments

y

Vector of observations

X_mu

Model matrix for the expected values μ

X_sigma

Model matrix for the logarithms of the standard deviations σ

intercept_mu

Boolean, if TRUE a column with the name '(Intercept)' is added to the matrix X_mu. This column represents the intercept term in the model for μ.

intercept_sigma

Boolean, if TRUE a column with the name '(Intercept_s)' is added to the matrix X_sigma. This column represents the intercept term in the model for \log σ.

sigma_min

Numeric, the minimum value for the standard deviations sigma. Can be a single number which applies to all observations or a vector containing a separate minimum for each observation.

slvr_options

A list with options to be passed on to the function maxLik which maximizes the log-liklihood

control

A list with further options. The following options can be set.

  • check_hessian Boolean, if TRUE it is checked whether the Hessian is negative-definite, i.e., whether the log-likelihood is at a maximum. The default value is TRUE.

  • slvr_log Boolean, if TRUE the output of maxLik is added to the 'lmvar' object. The default value is FALSE.

  • monitor Boolean, if TRUE diagnostic messages about errors that occur will be printed during the run. The default value is FALSE.

  • remove_df_sigma_pre Warning: this is an experimental option which could cause unexpected issues! Boolean, if TRUE the algorithm removes degrees of freedom of the model for σ to avoid convergence problems. They are removed before carrying out the fit. See 'Details'. The default value is FALSE.

  • remove_df_sigma_post Boolean, if TRUE the algorithm will remove degrees of freedom of the model for σ if this is necessary to achieve a satisfactory fit. They are removed after a fit has been attempted and turned out to fail. This option has no effect if sigma_min (or one of its elements) is larger than zero. See 'Details'. The default value is FALSE.

  • mu_full_rank Boolean, if TRUE it is assumed that X_mu (with the intercept term added in case intercept_mu = TRUE) is full-rank. The default value is FALSE.

  • sigma_full_rank Boolean, if TRUE it is assumed that X_sigma (with the intercept term added in case intercept_sigma = TRUE) is full-rank. The default value is FALSE.

...

Additional arguments, not used in the current implementation

Details

Response vector

The vector y must be a numeric vector. It can not contain special values like NULL, NaN, etc.

Model matrices

If the matrix X_mu is not specified, the model for the expected values μ will consist of an intercept term only. The argument intercept_mu is ignored in this case. Likewise, the model for \log σ will consist of an intercept term only if X_sigma is not specified. In the latter case, the model reduces to a standard linear model.

Both model matrices must be numeric matrices. They can not contain special values like NULL, NaN, etc.

The model matrices can be of class matrix or of a class from the package Matrix. They can also be of class numeric in case they are matrices with one column only.

All columns in the matrix X_mu must either have a name, or no column has a name at all. It is not allowed that some colums have a name while others don't. The same is true for X_sigma.

If supplied, the column names must be unique within X_mu. The same is true for X_sigma. A column name can be used in both X_mu and X_sigma though.

In case the matrix X_mu has no columns with a column name, lmvar gives the names v1, v2 etc. to the columns. Likewise, if the matrix X_sigma has no columns with a column name, lmvar gives the names v1_s, v2_s etc. to the columns.

Matrix X_mu can not have a column with the name '(Intercept)'. Matrix X_sigma can not have a column with the name '(Intercept_s)'. Both names are reserved names.

Minimum sigma

The argument sigma_min allows one to run a regression under the constraint of a minimum standard deviation for each observation. The argument can be a single number, which applies to all observations, or a vector which contains a separate minimum for each observation. In the latter case, all vector elements must be zero (in which case no constraint is applied) or all vector elements must be larger than zero.

The option of a minimum sigma is intended for situations in which an unconstrained regression attempts to make sigma equal to zero for one or more observations. This will cause extreme values for the β_σ and numerical instabilities.Such a situation can be remedied by bounding sigma from below.

In case sigma_min has a value (or a vector of values) larger than zero and the solve-method is "NR", the solve method is set to "BFGS". If the argument constraints is passed on to maxlik (as a list member of slvr_options), it is ignored.

Error estimates and confidence intervals (e.g. such as given by summary.lmvar and predict.lmvar) can be unreliable if minimum sigmas are specified.

Likelihood maximization

The function maxLik from the maxLik package, is used to maximize the (profile) log-likelihood. maxLik returns a termination code which reports whether a maximum was found, see maxLik. For the method "NR", a potential problem is reported by a maxLik return code different from 1, 2 or 8. For other methods, a code different from 0 flags a potential problem. In case the return code flags a potential problem, the message from maxLik is output as a warning.

All list elements in slvr_options are passed on as arguments to maxLik. The name of the list element is the argument name, the value of the list element is the argument value. It is not allowed to pass on the arguments fn, grad or hess. In case the list does not contain an element method, it is set to "NR". In case the list does not contain an element start, an initial estimate for β_σ is set by lmvar.

In case one wants to supply an initial estimate for the coefficients, one has to supply an initial estimate for β_σ. If beta_sigma_initial is the initial estimate, one passes on the argument slvr_options = list(start = beta_sigma_initial). The inital estimate beta_sigma_initial must be a numeric vector. Its length must be as follows.

There is no need to supply an inital estimate for β_μ, see the vignette 'Math' for details.

Reducing the degrees of freedom to improve fit

When maxLik exits with return code 3 (and corresponding warning 'Last step could not find a value above the current. Boundary of parameter space?'), it somehow did not succeed to fit an 'lmvar' model properly. The same is true if the the Hessian if the log-likelihood is not negative-definite.

In this situation, a proper fit can sometimes be achieved if one drops a few extra columns (sometimes just one column) from X_sigma. See the vignette 'Math' for details. The options control = list(remove_df_sigma_pre = TRUE, remove_df_sigma_post = TRUE)) do just that. They attempt to achieve a proper fit by dropping columns (i.e., reducing the degrees of freedom of the model for σ) if necessary.

The option remove_df_sigma_pre inspects the model matrices and the response vector before carrying out the fit, and drops columns from X_sigma if necessary. Warning: this is an experimental option which could cause unexpected issues!

The option remove_df_sigma_post = TRUE attempts to achieve a proper fit in the following two cases.

I.e., this option drops columns from X_sigma based on the results of a failed fit.

Other

When check_hessian = TRUE, it is checked whether the fitted log-likelihood is at a maximum. A warning will be issued if that is not the case.

The control options mu_full_rank and sigma_full_rank are for efficiency purposes. If set to TRUE, the corresponding model matrices will not be made full-rank because it is assumed they are full-rank already. However, setting one of these to TRUE while the corresponding model matrix is not full-rank will cause unpredictable and possibly unnoticed errors. These options should therefore only be changed from their default value with the utmost care.

See the vignettes that come with the lmvar package for more info. Run vignette(package="lmvar") to list the available vignettes.

Value

An object of class 'lmvar', which is a list. Users are discouraged to access list-members directly. Instead, list-members are to be accessed with the various accessor and utility functions in the package. Exceptions are the following list members for which no accessor functions exist:

See Also

lmvar_no_fit to create an object which is like an 'lmvar' object without carrying out a model fit.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# As example we use the dataset 'attenu' from the library 'datasets'. The dataset contains
# the response variable 'accel' and two explanatory variables 'mag'  and 'dist'.
library(datasets)

# For more info on the data, study the dataset
help("attenu")

# Create the model matrix for the expected values
X = cbind(attenu$mag, attenu$dist)
colnames(X) = c("mag", "dist")

# Create the model matrix for the standard deviations. The standard deviation
# is large for small distances and small for large distances. The use of 'dist'
# as explanatory variable makes the beta for the intercept term blow up.
# Therefore we use '1 / dist' as explanatory variable
X_s = cbind(attenu$mag, 1 / attenu$dist)
colnames(X_s) = c("mag", "dist_inv")

# Carry out the fit
fit = lmvar(attenu$accel, X, X_s)

# Inspect the results
summary(fit)

# Carry out the fit with an inital estimate for beta and ask for
# a report of the solver-routine
beta_sigma_start = c(-4, 0, 0)
fit = lmvar(attenu$accel, X, X_s,
            slvr_options = list(start = beta_sigma_start),
            control = list(slvr_log = TRUE))
fit$slvr_log

Example output

attenu                package:datasets                 R Documentation

_T_h_e _J_o_y_n_e_r-_B_o_o_r_e _A_t_t_e_n_u_a_t_i_o_n _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     This data gives peak accelerations measured at various observation
     stations for 23 earthquakes in California.  The data have been
     used by various workers to estimate the attenuating affect of
     distance on ground acceleration.

_U_s_a_g_e:

     attenu
     
_F_o_r_m_a_t:

     A data frame with 182 observations on 5 variables.

       [,1]  event    numeric  Event Number                     
       [,2]  mag      numeric  Moment Magnitude                 
       [,3]  station  factor   Station Number                   
       [,4]  dist     numeric  Station-hypocenter distance (km) 
       [,5]  accel    numeric  Peak acceleration (g)            
      
_S_o_u_r_c_e:

     Joyner, W.B., D.M. Boore and R.D. Porcella (1981).  Peak
     horizontal acceleration and velocity from strong-motion records
     including records from the 1979 Imperial Valley, California
     earthquake.  USGS Open File report 81-365. Menlo Park, Ca.

_R_e_f_e_r_e_n_c_e_s:

     Boore, D. M. and Joyner, W.B.(1982) The empirical prediction of
     ground motion, _Bull. Seism. Soc. Am._, *72*, S269-S268.

     Bolt, B. A. and Abrahamson, N. A. (1982) New attenuation relations
     for peak and expected accelerations of strong ground motion,
     _Bull. Seism. Soc. Am._, *72*, 2307-2321.

     Bolt B. A. and Abrahamson, N. A. (1983) Reply to W. B. Joyner & D.
     M. Boore's "Comments on: New attenuation relations for peak and
     expected accelerations for peak and expected accelerations of
     strong ground motion", _Bull. Seism. Soc. Am._, *73*, 1481-1483.

     Brillinger, D. R. and Preisler, H. K. (1984) An exploratory
     analysis of the Joyner-Boore attenuation data, _Bull. Seism. Soc.
     Am._, *74*, 1441-1449.

     Brillinger, D. R. and Preisler, H. K. (1984) _Further analysis of
     the Joyner-Boore attenuation data_.  Manuscript.

_E_x_a_m_p_l_e_s:

     require(graphics)
     ## check the data class of the variables
     sapply(attenu, data.class)
     summary(attenu)
     pairs(attenu, main = "attenu data")
     coplot(accel ~ dist | as.factor(event), data = attenu, show.given = FALSE)
     coplot(log(accel) ~ log(dist) | as.factor(event),
            data = attenu, panel = panel.smooth, show.given = FALSE)
     

Call: 
 lmvar(y = attenu$accel, X_mu = X, X_sigma = X_s)

Number of observations:  182 
Degrees of freedom    :  6 

Z-scores: 
    Min      1Q  Median      3Q     Max 
-1.4679 -0.6615 -0.1132  0.5736  2.9969 

Coefficients:
                 Estimate  Std. Error z value  Pr(>|z|)    
(Intercept)   -0.14518878  0.06367414 -2.2802    0.0226 *  
mag            0.05436925  0.01137133  4.7813 1.742e-06 ***
dist          -0.00129047  0.00014701 -8.7779 < 2.2e-16 ***
(Intercept_s) -4.33049617  0.44747494 -9.6776 < 2.2e-16 ***
mag_s          0.28918112  0.07286834  3.9685 7.231e-05 ***
dist_inv_s     3.14861255  0.23985317 13.1273 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Standard deviations: 
    Min      1Q  Median      3Q     Max 
 0.0634  0.0775  0.0920  0.1095 46.8245 

Comparison to model with constant variance (i.e. classical linear model)
Log likelihood-ratio: 31.38197 
Additional degrees of freedom: 2 
p-value for difference in deviance: 2.35e-14 
Maximum Likelihood estimation
Newton-Raphson maximisation, 10 iterations
Return code 1: gradient close to zero
Log-Likelihood: 154.7313 (3 free parameter(s))
Estimate(s): -4.330496 0.2891811 3.148613 

lmvar documentation built on May 16, 2019, 5:06 p.m.