asterdata: Object Describing Saturated Aster Model
In aster2: Aster Models

View source: R/asterdata.R

asterdata

R Documentation

Object Describing Saturated Aster Model

Description

Functions to construct and test conformance to the contract for objects of class "asterdata". All other functions in this package take model descriptions of this form.

Usage

asterdata(data, vars, pred, group, code, families, delta,
  response.name = "resp", varb.name = "varb",
  tolerance = 8 * .Machine$double.eps)
validasterdata(object, tolerance = 8 * .Machine$double.eps)
is.validasterdata(object, tolerance = 8 * .Machine$double.eps)

Arguments

`data`	a data frame containing response and predictor variables for the aster model.
`vars`	a character vector containing names of variables in the data frame `data` that are components of the response vector of the aster model.
`pred`	an integer vector satisfying `length(pred) == length(vars)` specifying the arrows of the subgraph of the aster model corresponding to a single individual. Must be nonnegative and satisfy `all(pred < seq(along = pred))`. A zero value of `pred[j]` indicates the predecessor of node `j` is an initial node (formerly called root node) of the subgraph. A nonzero value of `pred[j]` indicates the predecessor of node `j` is node `pred[j]`. In either case there is an arrow in the subgraph from predecessor node to successor node.
`group`	an integer vector satisfying `length(group) == length(vars)` specifying the lines of the subgraph of the aster model corresponding to a single individual, which in turn specify the dependence groups. Must be nonnegative and satisfy `all(group < seq(along = group))`. Nonzero elements of `group` indicate nodes of the subgraph that are connected by a line and hence are in the same dependence group: nodes `j` and `group[j]` are connected by a line. Since nodes in the same dependence group must have the same predecessor, this requires `pred[group[j]] == pred[j]`. Since nodes in the same dependence group must be in the same family, this requires `code[group[j]] == code[j]`. It also requires that the dimension of the family specified by `code[j]` be the same as the number of nodes in the dependence group. Zero elements of `group` indicate nothing about dependence groups. The lines indicate a transitive relation. If there is a line from node `j1` to node `j2` and a line from node `j2` to node `j3` then there is also a line from node `j1` to node `j3`, but this line need not be specified by the `group` vector, and indeed cannot. If there is a dependence group with `d` nodes, then there are `choose(d, 2)` lines connecting these nodes, but the `group` vector can only specify `d - 1` lines which imply the rest. For example, if nodes `j1`, `j2`, `j3`, and `j4` are to make up a four-dimensional dependence group and `j1 < j2`, `j2 < j3`, and `j3 < j4`, we must have `group[j1] == 0`, `group[j2] == j1`, `group[j3] == j2`, and `group[j4] == j3`. This is forced by the requirement `all(group < seq(along = group))`.
`code`	an integer vector satisfying `length(code) == length(vars)` specifying the families corresponding to the dependence groups. This requires all(code %in% seq(along = families) Node `j` is in a dependence group with family described by `families[code[j]]`. Note that `group[j] == k` requires `families[j] == families[k]` when `k != 0`.
`families`	a list of family specifications (see `families`). Specifications of families not having hyperparameters may be abbreviated as character strings, for example, `"binomial"` rather than `fam.binomial()`.
`delta`	a numeric vector satisfying `length(delta) == length(vars)` specifying the degeneracies of the aster model for a single individual. The model specified is the limit as `s \to \infty` of nondegenerate models having conditional canonical parameter vector `\theta + s \delta` (note that the conditional canonical parameter vector is always used here, regardless of whether conditional or unconditional canonical affine submodels are to be used). May be missing (and usually is) in which case `\delta = 0` is implied, meaning the limit is trivial (same as not taking a limit).
`response.name`	a character string giving the name of the response vector.
`varb.name`	a character string giving the name of the factor covariate that says which of the variables in the data frame `data` correspond to which components of the response vector.
`tolerance`	numeric >= 0. Relative errors smaller than `tolerance` are not considered in checking validity of normal location-scale data.
`object`	an object of class `"asterdata"`. The function `validasterdata` always returns `TRUE` or throws an error with an informative message. The function `is.validasterdata` never throws an error unless `object` has the wrong class, returning `TRUE` or `FALSE` according to whether `object` does or does not conform to the contract for class `"asterdata"`.

Details

Response variables in dependence groups are taken to be in the order they appear in the response vector. The first to appear in the response vector is the first canonical statistic for the dependence group distribution, the second to appear the second canonical statistic, and so forth. The number of response variables in the dependence group must match the dimension of the dependence group distribution.

This function only handles the usual case where the subgraph for every individual is isomorphic to subgraph for every other individual and all initial nodes (formerly called root nodes) correspond to the constant one. Each row of data is the data for one individual. The vectors vars, pred, group, code, and delta (if not missing) describe the subgraph for one individual (which is the same for all individuals).

In other cases for which this function does not have the flexibility to construct the appropriate object of class "asterdata", such an object will have to be constructed “by hand” using R statements not involving this function or modifying an object produced by this function. See the following section for description of such objects. The functions validasterdata and is.validasterdata can be used to check whether objects constructed “by hand” have been constructed correctly.

Value

an object of class "asterdata" is a list containing the following components

`redata`	a data frame having `nrow(data) * length(vars)` rows and containing variables having names in `setdiff(names(data), vars)` and also the names `"id"`, `response.name`, and `varb.name`. Produced from `data` using the `reshape` function. Each variable in `setdiff(names(data), vars)` is repeated `length(vars)` times. The variable named `response.name` is the concatenation of the variables in `data` with names in `vars`. The variable named `varb.name` is a factor having levels `vars` that says which of the variables in the data frame `data` correspond to which components of the response vector. The variable named `"id"` is an integer vector that says which of the individuals (which rows of `data`) correspond to which rows of `redata`. Not all objects of class `"asterdata"` need have an `id` variable, although all those constructed by this function do.
`repred`	an integer vector satisfying `length(repred) == nrow(redata)` specifying the arrows of the graph of the aster model for all individuals. Must be nonnegative and satisfy `all(repred < seq(along = repred))`. A zero value of `repred[j]` indicates the predecessor of node `j` is an initial node (formerly called root node) of the graph. A nonzero value of `repred[j]` indicates the predecessor of node `j` is node `repred[j]`. In either case there is an arrow in the graph from predecessor node to successor node. Note that `repred` is determined by `pred` but is quite different from it. Firstly, the lengths differ. Secondly, `repred` is not just a repetition of `pred`. The numbers in `pred`, if nonzero, are indices for the vector `vars` whereas the numbers in `repred`, if nonzero, are row indices for the data frame `redata`.
`initial`	a numeric vector specifying constants associated with initial nodes (formerly called root nodes) of the graphical model for all individuals. If `repred[j] == 0` then the predecessor of node `j` is an initial node associated with the constant `initial[j]`, which must be a positive integer unless the family associated with the arrow from this initial node to node `j` is infinitely divisible (the only such family currently implemented being Poisson), in which case `initial[j]` must be a strictly positive and finite real number. If `repred[j] != 0`, then `initial[j]` is ignored and may be any numeric value, including `NA` or `NaN`. This function always makes `initial` equal to `rep(1, nrow(redata))` but the more general description above is valid for objects of class `"asterdata"` constructed “by hand”.
`regroup`	an integer vector satisfying `length(regroup) == nrow(redata)` specifying the lines of the graph of the aster model for all individuals, which in turn specify the dependence groups. Must be nonnegative and satisfy `all(regroup < seq(along = regroup))`. Nonzero elements of `regroup` indicate nodes of the graph that are connected by a line and hence are in the same dependence group: nodes `j` and `regroup[j]` are connected by a line. Since nodes in the same dependence group must have the same predecessor, this requires `repred[regroup[j]] == repred[j]`. Since nodes in the same dependence group must be in the same family, this requires `recode[regroup[j]] == recode[j]`. It also requires that the dimension of the family specified by `recode[j]` be the same as the number of nodes in the dependence group. Zero elements of `regroup` indicate nothing about dependence groups. The lines indicate a transitive relation. If there is a line from node `j1` to node `j2` and a line from node `j2` to node `j3` then there is also a line from node `j1` to node `j3`, but this line need not be specified by the `group` vector, and indeed cannot. If there is a dependence group with `d` nodes, then there are `choose(d, 2)` lines connecting these nodes, but the `group` vector can only specify `d - 1` lines which imply the rest. For example, if nodes `j1`, `j2`, `j3`, and `j4` are to make up a four-dimensional dependence group and `j1 < j2`, `j2 < j3`, and `j3 < j4`, we must have `regroup[j1] == 0`, `regroup[j2] == j1`, `regroup[j3] == j2`, and `regroup[j4] == j3`. This is forced by the requirement `all(regroup < seq(along = regroup))`. Note that `regroup` is determined by `group` but is quite different from it. Firstly, the lengths differ. Secondly, `regroup` is not just a repetition of `group`. The numbers in `group`, if nonzero, are indices for the vector `vars` whereas the numbers in `regroup`, if nonzero, are row indices for the data frame `redata`.
`recode`	an integer vector satisfying `length(recode) == nrow(redata)` specifying the families corresponding to the dependence groups. This requires all(recode %in% seq(along = families) Node `j` is in a dependence group with family described by `families[recode[j]]`. Note that `regroup[j] == k` requires `recode[j] == recode[k]` when `regroup[j] != 0`. Also note that `recode` is determined by `code` but is different from it. Firstly, the lengths differ. Secondly, `recode` need not be just a repetition of `code`. This function always makes `recode` equal to `rep(code, each = nrow(redata))` but the more general description above is valid for objects of class `"asterdata"` constructed “by hand”.
`families`	a copy of the argument of the same name of this function except that any character string abbreviations are converted to objects of class `"astfam"`.
`redelta`	a numeric vector satisfying `length(redelta) == nrow(redata)` specifying the degeneracies of the aster model for all individuals. If not the zero vector, the degenerate model specified is the limit as `s \to \infty` of nondegenerate models having conditional canonical parameter vector `\theta + s \delta` (note that the conditional canonical parameter vector is always used here, regardless of whether conditional or unconditional canonical affine submodels are to be used). Note that `redelta` is determined by `delta` but is different from it. Firstly, the lengths differ. Secondly, `redelta` need not be just a repetition of `delta`. This function always makes `redelta` equal to `rep(delta, each = nrow(redata))` but the more general description above is valid for objects of class `"asterdata"` constructed “by hand”.
`response.name`	a character string giving the name of the response variable in `redata`. For this function, a copy of the argument `response.name`.
`varb.name`	a character string giving the name of the “varb” variable in `redata`. For this function, a copy of the argument `varb.name`.

In addition an object of class "asterdata" may contain (and those constructed by this function do contain) components pred, group, and code, which are copies of the arguments of the same names of this function. Objects of class "asterdata" not constructed by this function need not contain these additional components, since they may make no sense if the graph for all individuals is not the repetition of isomorphic subgraphs, one for each individual.

Examples

data(test1)
fred <- asterdata(test1, vars = c("m1", "n1", "n2"), pred = c(0, 1, 1),
    group = c(0, 0, 2), code = c(1, 2, 2),
    families = list("bernoulli", "normal.location.scale"))
is.validasterdata(fred)

aster2 documentation built on Sept. 18, 2024, 1:06 a.m.