rsm: Fit a Regression-Scale Model

Description Usage Arguments Details Value Note References See Also Examples

View source: R/marg.R

Description

Produces an object of class rsm which is a regression-scale model fit of the data.

Usage

1
2
3
4
5
6
rsm(formula = formula(data), family = gaussian, 
    data = sys.frame(sys.parent()), dispersion = NULL, 
    weights = NULL, subset = NULL, na.action = na.fail, 
    offset = NULL, method = "rsm.surv", 
    control = glm.control(maxit=100, trace=FALSE), 
    model = FALSE, x = FALSE, y = TRUE, contrasts = NULL, ...)

Arguments

formula

a formula expression as for other linear regression models, of the form response ~ predictors where the predictors are separated by suitable operators. See the documentation of lm and formula for details.

family

a family.rsm object, i.e. a list of functions and expressions characterizing the error distribution. Families supported are gaussian, student (Student's t), extreme (Gumbel or extreme value), logistic, logWeibull, logExponential, logRayleigh and Huber (Huber's least favourable). These represent calls to the corresponding generator functions. The calls to gaussian, extreme, logistic, logWeibull, logExponential and logRayleigh can be given without parentheses. The functions student and Huber may take as argument respectively the degrees of freedom (df) and the tuning constant (k). Users can construct their own families, as long as they have components compatible with those given in rsm.distributions. The demonstration file ‘margdemo.R’ that ships with the package shows how to create a new generator function. The default is gaussian.

data

an optional data frame in which to interpret the variables occurring in the model formula, or in the subset and the weights arguments. If this is missing, then the variables in the formula should be on the search list.

dispersion

if NULL, the scale parameter is taken to be unknown. If known, the numerical value can be passed. The default is NULL. Huber's least favourable distribution represents a special case. If dispersion is NULL, the maximum likelihood estimate is computed, while if TRUE the MAD estimate is calculated and the scale parameter fixed to this value in subsequent computations.

weights

the optional weights for the fitting criterion. If supplied, the response variable and the covariates are multiplied by the weights in the IRLS algorithm. The length of the weights argument must be the same as the number of observations. The weights must be nonnegative and it is strongly recommended that they be strictly positive, since zero weights are ambiguous, compared to use of the subset argument.

subset

expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default.

na.action

a function to filter missing data. This is applied to the model frame after any subset argument has been used. The default (with na.fail) is to create an error if any missing value is found. A possible alternative is na.omit, which deletes observations that contain one or more missing values.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. An offset term can be included in the formula instead or as well, and if both are specified their sum is used. Defaults to NULL

method

the fitting method to be used; the default is rsm.fit. The method model.frame simply returns the model frame.

control

a list of iteration and algorithmic constants. See glm.control for their names and default values.

model

if TRUE, the model frame is returned; default is FALSE.

x

if TRUE, the model matrix is returned; default is FALSE.

y

if TRUE, the response variable is returned; default is TRUE.

contrasts

a list of contrasts to be used for some or all of the factors appearing as variables in the model formula. The names of the list should be the names of the corresponding variables, and the elements should either be contrast-type matrices (matrices with as many rows as levels of the factor and with columns linearly independent of each other and of a column of one's), or else they should be functions that compute such contrast matrices.

...

absorbs any additional argument.

Details

The model is fitted using Iteratively Reweighted Least Squares, IRLS for short (Green, 1984, Jorgensen, 1984). The working response and iterative weights are computed using the functions contained in the family.rsm object.

The two workhorses of rsm are rsm.fit and rsm.surv, which expect an X and Y argument rather then a formula. The first function is used for the families student with df less than 3 and Huber; the second one, based on the survreg.fit routine for fitting parametric survival models, is used in case of extreme, logistic, logWeibull, logExponential, logRayleigh and student (with df > 2) error distributions. In the presence of a user-defined error distribution the rsm.fit routine is used. The rsm.null function is invoked to fit an empty (null) model.

The details are given in Brazzale (2000, Section 6.3.1).

Value

an object of class rsm is returned which inherits from glm and lm. See rsm.object for details.

The output can be examined by print, summary, rsm.diag.plots and anova. Components can be extracted using fitted, residuals, formula and family. It can be modified using update. It has most of the components of a glm object, with a few more. Use rsm.object for further details.

Note

In case of extreme, logistic, logWeibull, logExponential, logRayleigh and student (with df > 2) error distributions, both methods, rsm.fit (default choice) and rsm.surv, can be used to fit the model. There are, however, examples where one of the two algorithms (most likely the one invoked by rsm.surv) breaks down. If this is the case, try and refit the model with the alternative choice.

The message "negative iterative weights returned!" is returned if some of the iterative weights (q2 component of the fitted rsm object) are negative. These would be used by default by the rsm.diag routine for the definition of residuals and regression diagnostics. In order to avoid missing values (NAs), the default weighting scheme "observed" automatically switches to "score" unless otherwise specified.

References

Brazzale, A. R. (2000) Practical Small-Sample Parametric Inference. Ph.D. Thesis N. 2230, Department of Mathematics, Swiss Federal Institute of Technology Lausanne.

Green, P. J. (1984) Iteratively reweighted least squares for maximum likelihood estimation, and some robust and resistant alternatives (with Discussion). J. R. Statist. Soc. B, 46, 149–192.

Jorgensen, B. (1984) The delta algorithm and GLIM. Int. Stat. Rev., 52, 283–300.

See Also

rsm.object, rsm.fit, rsm.surv, rsm.null, rsm.families

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## House Price Data
data(houses)
houses.rsm <- rsm(price ~ ., family = student(5), data = houses)
## model fit including all covariates
houses.rsm <- rsm(price ~ ., family = student(5), data = houses, 
                  method = "rsm.fit", control = glm.control(trace = TRUE))
## prints information about the iterative procedure at each iteration
update(houses.rsm, ~ . - bdroom + offset(7 * bdroom))
## "bdroom" is included as offset variable with fixed (= 7) coefficient

## Sea Level Data
data(venice)
attach(venice)
Year <- 1:51/51
venice.2.rsm <- rsm(sea ~ Year + I(Year^2), family = extreme)
## quadratic model fitted to sea level data
venice.1.rsm <- update(venice.2.rsm, ~. - I(Year^2))
## linear model fit
##
c11 <- cos(2*pi*1:51/11) ; s11 <- sin(2*pi*1:51/11)
c19 <- cos(2*pi*1:51/18.62) ; s19 <- sin(2*pi*1:51/18.62)
venice.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11 + c19 + s19, 
                  family = extreme)
## includes 18.62-year astronomical tidal cycle and 11-year sunspot cycle
venice.11.rsm <- rsm(sea ~ Year + I(Year^2) + c11 + s11, family = extreme)
venice.19.rsm <- rsm(sea ~ Year + I(Year^2) + c19 + s19, family = extreme)
## includes either astronomical cycle
##
## comparison of linear, quadratic and periodic (11-year, 19-year) models 
plot(year, sea, ylab = "sea level") 
lines(year, fitted(venice.1.rsm))
lines(year, fitted(venice.2.rsm), col="red")
lines(year, fitted(venice.11.rsm), col="blue")
lines(year, fitted(venice.19.rsm), col="green")
##
detach()

## Darwin's Data on Growth Rates of Plants
data(darwin)
darwin.rsm <- rsm(cross - self ~ pot - 1, family  =  student(3), 
                  data = darwin)
## Maximum likelihood estimates
darwin.rsm <- rsm(cross - self ~ pot - 1, family = Huber, data = darwin)
## M-estimates

marg documentation built on April 16, 2018, 5:06 p.m.