PIC.mlm | R Documentation |
Computation of predictive information criteria for multivariable linear models. Currently, computations are supported for only bivariable linear models.
## S3 method for class 'mlm' PIC(object, newdata, group_sizes = NULL, bootstraps = NULL, ...)
object |
A fitted model object of |
newdata |
An optional dataframe to be used as validation data in computing PIC. If omitted, the training data contained within |
group_sizes |
An optional scalar or numeric vector indicating the sizes of |
bootstraps |
An optional numeric value indicating the number of bootstrap samples to use for a bootstrapped PIC. See 'Details'. |
... |
Further arguments passed to or from other methods. |
PIC.mlm
computes PIC values based on the supplied multivariable regression model. Candidate models with relatively smaller criterion values are preferred.
Depending on the value(s) supplied to group_sizes
, one of three implementations of PIC are computed:
iPIC: The individualized predictive information criterion (iPIC) is computed when group_sizes = 1
. A value
of iPIC is determined for each individual observation in newdata
. Using iPIC, one may thus select optimal predictive
models specific to each particular validation datapoint.
gPIC: The group predictive information criterion (gPIC) is computed when group_sizes > 1
or
is.vector(group_sizes) == TRUE
. A value of gPIC is determined for each cohort or group of observations
defined by the partitions of newdata
. Using gPIC, one may thus select optimal predictive models specific to each
group of validation datapoints. For the class of regression models, the gPIC value of a group of validation observations
is equivalent to the sum of their individual iPIC values.
tPIC: The total predictive information criterion (tPIC) is computed when group_sizes = NULL
. Computation of
the tPIC is the default, and one may use the tPIC to select the optimal predictive model for the entire set of validation
datapoints. The tPIC and gPIC are equivalent when group_sizes = m
, where m
is the number of observations in
newdata
. When newdata
is not supplied, tPIC is exactly equivalent to the Akaike Information Criterion (AIC).
Distinct from the computation for the class of "lm" models (PIC.lm), the PIC computation for multivariable regression models differs depending on the whether validation data are partially or completely unobserved. If partially unobserved, where only some values of the multivariable response vector are unknown/unobserved, any remaining observed values are used in the PIC computation.
If a numeric value is supplied to bootstraps
the total Predictive information criterion (tPIC) is computed bootstraps
times, where
generated bootstrap samples are each used as sets of validation data in computing the tPIC. It is assumed that the multivariable response vectors are each
completely unobserved. The resulting tPIC values are then averaged to generate a single,
bootstrapped tPIC value. Model selection based on this bootstrapped tPIC value may lead to the selection of a more generally applicable predictive model whose
predictive accuracy is not strictly optimized to a particular set of validation data.
For further details, see A new class of information criteria for improved prediction in the presence of training/validation data heterogeneity.
If group_sizes = NULL
or bootstraps > 0
, a scalar is returned. Otherwise, newdata
is
returned with an appended column labeled 'PIC' containing either iPIC or gPIC values,
depending on the value provided to group_sizes
.
Flores, J.E. (2021), A new class of information criteria for improved prediction in the presence of training/validation data heterogeneity [Unpublished PhD dissertation]. University of Iowa.
PIC
, PIC.lm
, lm
require(dplyr, quietly = TRUE) data(iris) # Fit a bivariable regression model mod <- lm(cbind(Sepal.Length, Sepal.Width) ~ ., data = iris) class(mod) # Hypothetical validation data set.seed(1) vdat <- iris[sample(1:nrow(iris), 10),] # tPIC, completely unobserved response data PIC(object = mod, newdata = vdat %>% dplyr::mutate(Sepal.Length = NA, Sepal.Width = NA)) # tPIC, partially unobserved response data PIC(object = mod, newdata = vdat %>% dplyr::mutate(Sepal.Length = NA)) # tPIC, mix of completely and partially unobserved cases. PIC(object = mod, newdata = vdat %>% dplyr::mutate(Sepal.Length = ifelse(Sepal.Length < 6, NA, Sepal.Length), Sepal.Width = ifelse(Sepal.Width < 3.3, NA, Sepal.Width))) # tPIC, newdata not supplied PIC(object = mod) # gPIC PIC(object = mod, newdata = vdat, group_sizes = c(5,3,2)) PIC(object = mod, newdata = vdat, group_sizes = 5) # iPIC PIC(object = mod, newdata = vdat, group_sizes = rep(1, 10)) PIC(object = mod, newdata = vdat, group_sizes = 1) # bootstrapped tPIC (based on 10 bootstrap samples) set.seed(1) PIC(object = mod, bootstraps = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.