grouped: Regression for Grouped Data - Coarse Data

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

grouped is used to fit regression models for grouped or coarse data under the assumption that the data are Coarsened At Random.

Usage

1
2
3
grouped(formula, link = c("identity", "log", "logit"), 
            distribution = c("normal", "t", "logistic"), data,
            subset, na.action, str.values, df = NULL, iter = 3, ...)

Arguments

formula

a two-sided formula describing the model structure. In the left-hand side, a two-column response matrix must be supplied, specifying the lower and upper limits (1st and 2nd column, respectively) of the interval in which the true response lies. They can be defined arbitrarily or you can use the functions equispaced and rounding.

link

the link function under which the underlying response variable follows the distribution given by the distribution argument. Available choices are "identity", "log" and "logit". See Details for more info.

distribution

the assumed distribution for the true latent response variable. Available choices are "normal", "t" and "logistic". See Details for more info.

data

an optional data.frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which grouped is called.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs.

str.values

a numeric vector of starting values.

df

a scalar numeric value denoting the degrees of freedom when the underlying distribution for the response variable is assumed to be Student's-t.

iter

the number of extra times to call optim in case the first optimization has not converged.

...

additional arguments; currently none is used.

Details

Let Z_i, i = 1, ..., n be a random sample from a response variable of interest. In many problems one can think of the sample space S_i of Z_i as being partitioned into a number of groups; one then observes not the exact value of Z_i but the group into which it falls. Data generated in this way are called grouped (Heitjan, 1989). The function grouped and this package are devoted in the analysis of such data in the case the data are Coarsened At Random (Heitjan and Rubin, 1991).

The framework we use assumes a latent variable Z_i which is coarsely measured and for which we only know Y_{li} and Y_{ui}, i.e., the interval in which Z_i lies. Given some covariates X_i, Z_i|X_i may assume either a Normal, a Logistic or (generalized) Student's-t distribution. In addition three link functions are available for greater flexibility. In particular, the likelihood is of the following form

L_i(β, σ) = F[(y_u^* - xβ)/σ] - F[(y_l^* - xβ)/σ],

where F(.) denotes the cdf of the assumed distribution given by the argument distribution and y_l^* = φ(y_l), where φ(.) denotes the link function, and y_u is defined analogously.

An interesting example of coarse data is the various quality of life indexes. The observed value of such indexes can be thought of as a rounded version of the true latent quality of life that the index attempts to capture. Applications of this approach can be found in Lesaffre et al. (2005) and Tsonaka et al. (2005). Various other examples of grouped and coarse data can be found in Heitjan (1989; 1993).

Value

an object of class grouped is a list with the following components:

coefficients

the estimated coefficients, including the standard deviation σ.

hessian

the approximate Hessian matrix at convergence returned by optim.

fitted

the fitted values.

details

a list with components: (i) X the design matrix, (ii) y the response data matrix, (iii) convergence the convergence identifier returned by optim, (iv) logLik the value of the log-likelihood at convergence, (v) k the number of outer iterations used, (vi) n the sample size, (vii) df the degrees of freedom; NULL except for the t distribution, (viii) link the link function used, (ix) distribution the distribution assumed for the true latent response variable and (x) max.sc the maximum absolute value of the score vector at convergence.

call

the matched call.

Author(s)

Dimitris Rizopoulos d.rizopoulos@erasmusmc.nl

References

Heitjan, D. (1989) Inference from grouped continuous data: A review (with discussion). Statistical Science, 4, 164–183.

Heitjan, D. (1993) Ignorability and coarse data: some biomedical examples. Biometrics, 49, 1099–1109.

Heitjan, D. and Rubin, D. (1991) Ignorability and coarse data. Annals of Statistics, 19, 2244–2253.

Lesaffre, E., Rizopoulos, D. and Tsonaka, S. (2007) The logistic-transform for bounded outcome scores. Biostatistics, 8, 72–85.

Tsonaka, S., Rizopoulos, D. and Lesaffre, E. (2006) Power and sample size calculations for discrete bounded outcomes. Statistics in Medicine, 25, 4241–4252.

See Also

anova.grouped, plot.grouped, residuals.grouped, summary.grouped, power.grouped

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
    
grouped(cbind(lo, up) ~ treat * x, link = "logit", data = Sdata)
    
grouped(equispaced(r, n) ~ x1 * x2, link = "logit", data = Seeds)

# See Figure 1 and Table 1 in Heitjan (1989)
y <- iris[iris$Species == "setosa", "Petal.Width"]
index <- cbind(seq(0.05, 0.55, 0.1), seq(0.15, 0.65, 0.1)) 
n <- length(y)
a <- b <- numeric(n)
for(i in 1:n){
    ind <- which(index[, 2] - y[i] > 0)[1]
    a[i] <- index[ind, 1]
    b[i] <- index[ind, 2]
}
summary(grouped(cbind(a, b) ~ 1))

# See Figure 1 and Table 1 in Heitjan (1989)
y <- iris[iris$Species == "setosa", "Petal.Length"]
index <- cbind(seq(0.95, 1.75, 0.2), seq(1.15, 1.95, 0.2)) 
n <- length(y)
a <- b <- numeric(n)
for(i in 1:n){
    ind <- which(index[, 2] - y[i] > 0)[1]
    a[i] <- index[ind, 1]
    b[i] <- index[ind, 2]
}
summary(grouped(cbind(a, b) ~ 1))

grouped documentation built on May 2, 2019, 2:42 a.m.

Related to grouped in grouped...