View source: R/extend_family.R
extend_family | R Documentation |
This function adds some internally required elements to an object of class
family
(see, e.g., family()
). It is called internally by
init_refmodel()
, so you will rarely need to call it yourself.
extend_family(
family,
latent = FALSE,
latent_y_unqs = NULL,
latent_ilink = NULL,
latent_ll_oscale = NULL,
latent_ppd_oscale = NULL,
augdat_y_unqs = NULL,
augdat_link = NULL,
augdat_ilink = NULL,
augdat_args_link = list(),
augdat_args_ilink = list(),
...
)
family |
An object of class |
latent |
A single logical value indicating whether to use the latent
projection ( |
latent_y_unqs |
Only relevant for a latent projection where the original
response space has finite support (i.e., the original response values may
be regarded as categories), in which case this needs to be the character
vector of unique response values (which will be assigned to |
latent_ilink |
Only relevant for the latent projection, in which case
this needs to be the inverse-link function. If the original response family
was the |
latent_ll_oscale |
Only relevant for the latent projection, in which
case this needs to be the function computing response-scale (not
latent-scale) log-likelihood values. If |
latent_ppd_oscale |
Only relevant for the latent projection, in which
case this needs to be the function sampling response values given latent
predictors that have been transformed to response scale using
|
augdat_y_unqs |
Only relevant for augmented-data projection, in which
case this needs to be the character vector of unique response values (which
will be assigned to |
augdat_link |
Only relevant for augmented-data projection, in which case
this needs to be the link function. Use |
augdat_ilink |
Only relevant for augmented-data projection, in which
case this needs to be the inverse-link function. Use |
augdat_args_link |
Only relevant for augmented-data projection, in which
case this may be a named |
augdat_args_ilink |
Only relevant for augmented-data projection, in
which case this may be a named |
... |
Ignored (exists only to swallow up further arguments which might be passed to this function). |
In the following, N
, C_{\mathrm{cat}}
,
C_{\mathrm{lat}}
, S_{\mathrm{ref}}
, and
S_{\mathrm{prj}}
from help topic refmodel-init-get are used.
Note that N
does not necessarily denote the number of original
observations; it can also refer to new observations. Furthermore, let S
denote either S_{\mathrm{ref}}
or S_{\mathrm{prj}}
,
whichever is appropriate in the context where it is used.
The family
object extended in the way needed by projpred.
As their first input, the functions supplied to arguments augdat_link
and
augdat_ilink
have to accept:
For augdat_link
: an S \times N \times C_{\mathrm{cat}}
array containing the probabilities for the response categories. The
order of the response categories is the same as in family$cats
(see
argument augdat_y_unqs
).
For augdat_ilink
: an S \times N \times C_{\mathrm{lat}}
array containing the linear predictors.
The return value of these functions needs to be:
For augdat_link
: an S \times N \times C_{\mathrm{lat}}
array containing the linear predictors.
For augdat_ilink
: an S \times N \times C_{\mathrm{cat}}
array containing the probabilities for the response categories. The
order of the response categories has to be the same as in family$cats
(see
argument augdat_y_unqs
).
For the augmented-data projection, the response vector resulting from
extract_model_data
(see init_refmodel()
) is coerced to a factor
(using
as.factor()
) at multiple places throughout this package. Inside of
init_refmodel()
, the levels of this factor
have to be identical to
family$cats
(after applying extend_family()
inside of
init_refmodel()
). Everywhere else, these levels have to be a subset of
<refmodel>$family$cats
(where <refmodel>
is an object resulting from
init_refmodel()
). See argument augdat_y_unqs
for how to control
family$cats
.
For ordinal brms families, be aware that the submodels (onto which the reference model is projected) currently have the following restrictions:
The discrimination parameter disc
is not supported (i.e., it is a
constant with value 1).
The thresholds are "flexible"
(see brms::brmsfamily()
).
The thresholds do not vary across the levels of a factor
-like variable
(see argument gr
of brms::resp_thres()
).
The "probit_approx"
link is replaced by "probit"
.
For the brms::categorical()
family, be aware that:
For multilevel submodels, the group-level effects are allowed to be correlated between different response categories.
For multilevel submodels, mclogit versions < 0.9.4 may throw the
error 'a' (<number> x 1) must be square
. Updating mclogit to a
version >= 0.9.4 should fix this.
The function supplied to argument latent_ilink
needs to have the prototype
latent_ilink(lpreds, cl_ref, wdraws_ref = rep(1, length(cl_ref)))
where:
lpreds
accepts an S \times N
matrix containing the linear
predictors.
cl_ref
accepts a numeric vector of length S_{\mathrm{ref}}
,
containing projpred's internal cluster indices for these draws.
wdraws_ref
accepts a numeric vector of length
S_{\mathrm{ref}}
, containing weights for these draws. These
weights should be treated as not being normalized (i.e., they don't
necessarily sum to 1
).
The return value of latent_ilink
needs to contain the linear predictors
transformed to the original response space, with the following structure:
If is.null(family$cats)
(after taking latent_y_unqs
into account): an
S \times N
matrix.
If !is.null(family$cats)
(after taking latent_y_unqs
into account): an
S \times N \times C_{\mathrm{cat}}
array. In that case,
latent_ilink
needs to return probabilities (for the response categories
given in family$cats
, after taking latent_y_unqs
into account).
The function supplied to argument latent_ll_oscale
needs to have the
prototype
latent_ll_oscale(ilpreds, y_oscale, wobs = rep(1, length(y_oscale)), cl_ref, wdraws_ref = rep(1, length(cl_ref)))
where:
ilpreds
accepts the return value from latent_ilink
.
y_oscale
accepts a vector of length N
containing response values on
the original response scale.
wobs
accepts a numeric vector of length N
containing observation
weights.
cl_ref
accepts the same input as argument cl_ref
of latent_ilink
.
wdraws_ref
accepts the same input as argument wdraws_ref
of
latent_ilink
.
The return value of latent_ll_oscale
needs to be an S \times N
matrix containing the response-scale (not latent-scale) log-likelihood values
for the N
observations from its inputs.
The function supplied to argument latent_ppd_oscale
needs to have the
prototype
latent_ppd_oscale(ilpreds_resamp, wobs, cl_ref, wdraws_ref = rep(1, length(cl_ref)), idxs_prjdraws)
where:
ilpreds_resamp
accepts the return value from latent_ilink
, but possibly
with resampled (clustered) draws (see argument nresample_clusters
of
proj_predict()
).
wobs
accepts a numeric vector of length N
containing observation
weights.
cl_ref
accepts the same input as argument cl_ref
of latent_ilink
.
wdraws_ref
accepts the same input as argument wdraws_ref
of
latent_ilink
.
idxs_prjdraws
accepts a numeric vector of length dim(ilpreds_resamp)[1]
containing the resampled indices of the projected draws (i.e., these indices
are values from the set \{1, ..., \texttt{dim(ilpreds)[1]}\}
where ilpreds
denotes the return value of
latent_ilink
).
The return value of latent_ppd_oscale
needs to be a
\texttt{dim(ilpreds\_resamp)[1]} \times N
matrix containing the response-scale (not latent-scale) draws from the
posterior(-projection) predictive distributions for the N
observations
from its inputs.
If the bodies of these three functions involve parameter draws from the
reference model which have not been projected (e.g., for latent_ilink
, the
thresholds in an ordinal model), cl_agg()
is provided as a helper function
for aggregating these reference model draws in the same way as the draws have
been aggregated for the first argument of these functions (e.g., lpreds
in
case of latent_ilink
).
In fact, the weights passed to argument wdraws_ref
are nonconstant only in
case of cv_varsel()
with cv_method = "LOO"
and validate_search = TRUE
.
In that case, the weights passed to this argument are the PSIS-LOO CV weights
for one observation. Note that although argument wdraws_ref
has the suffix
_ref
, wdraws_ref
does not necessarily obtain weights for the initial
reference model's posterior draws: In case of cv_varsel()
with cv_method = "kfold"
, these weights may refer to one of the K
reference model
refits (but in that case, they are constant anyway).
If family$cats
is not NULL
(after taking latent_y_unqs
into account),
then the response vector resulting from extract_model_data
(see
init_refmodel()
) is coerced to a factor
(using as.factor()
) at multiple
places throughout this package. Inside of init_refmodel()
, the levels of
this factor
have to be identical to family$cats
(after applying
extend_family()
inside of init_refmodel()
). Everywhere else, these levels
have to be a subset of <refmodel>$family$cats
(where <refmodel>
is an
object resulting from init_refmodel()
).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.