predict.mfp2: Predict Method for 'mfp2'

View source: R/predict.mfp2.R

predict.mfp2R Documentation

Predict Method for mfp2

Description

Obtains predictions from an mfp2 object.

Usage

## S3 method for class 'mfp2'
predict(
  object,
  newdata = NULL,
  type = NULL,
  terms = NULL,
  terms_seq = c("equidistant", "data"),
  alpha = 0.05,
  ref = NULL,
  strata = NULL,
  newoffset = NULL,
  nseq = 100,
  ...
)

Arguments

object

a fitted object of class mfp2.

newdata

optionally, a matrix with column names in which to look for variables with which to predict. If provided, the variables are internally shifted using the shifting values stored in object. See mfp2() for further details.

type

the type of prediction required. The default is on the scale of the linear predictors. See predict.glm() or predict.coxph() for details. In case type = "terms", see the Section on ⁠Terms prediction⁠. In case type = "contrasts", see the Section on Contrasts.

terms

a character vector of variable names specifying for which variables term or contrast predictions are desired. Only used in case type = "terms" or type = "contrasts". If NULL (the default) then all selected variables in the final model will be used. In any case, only variables used in the final model are used, even if more variable names are passed.

terms_seq

a character string specifying how the range of variable values for term predictions are handled. The default equidistant computes the range of the data range and generates an equidistant sequence of 100 points from the minimum to the maximum values of shifted values to properly show the functional form estimated in the final model. The option data uses the observed data values directly, but these may not adequately reflect the functional form of the data, especially when extreme values or influential points are present.

alpha

significance level used for computing confidence intervals in terms prediction.

ref

a named list of reference values used when type = "contrasts". Note that any variable requested in terms, but not having an entry in this list (or if the entry is NULL) then the mean value of shifted data (or minimum for binary variables) will be used as reference. Values should be specified on the original scale of the variable since the program will internally scale it using the scaling factors obtained from find_scale_factor(). By default, this function uses the means (for continuous variables) and minimum (for binary variables) as reference values.

strata

stratum levels used for predictions.

newoffset

A vector of offsets used for predictions. This parameter is important when newdata is supplied. The offsets are directly added to the linear predictor without any transformations.

nseq

Integer specifying how many values to generate when terms_seq = "equidistant". Default is 100.

...

further arguments passed to predict.glm() or predict.coxph().

Details

To prepare the newdata for prediction, this function applies any necessary shifting based on factors obtained from the training data. It is important to note that if the shifting factors estimated from the training data are not sufficiently large, variables in newdata may end up being non-positive, which can cause prediction errors when non-linear functional forms such as logarithms are used. In such cases, the function issues a warning. The next step involves transforming the data using the selected fractional polynomial (FP) powers. After transformation, variables are centered if center was set to TRUE in mfp2(). Once transformation (and centering) is complete, the transformed data is passed to either predict.glm() or predict.coxph(), depending on the model family used, provided that type is neither terms nor contrasts (see the section handling terms and contrasts for details).

Value

For any type other than "terms" the output conforms to the output of predict.glm() or predict.coxph().

If type = "terms" or type = "contrasts", then a named list with entries for each variable requested in terms (excluding those not present in the final model). Each entry is a data.frame with the following columns:

  • variable: variable values on original scale (without shifting).

  • variable_pre: variable with pre-transformation applied, i.e. shifted, and centered as required.

  • value: partial linear predictor or contrast (depending on type).

  • se: standard error of partial linear predictor or contrast.

  • lower: lower limit of confidence interval.

  • upper: upper limit of confidence interval.

Terms prediction

If type = "terms", this function computes the partial linear predictors for each variable included in the final model. Unlike predict.glm() and predict.coxph(), this function accounts for the fact that a single variable may be represented by multiple transformed terms.

For a variable modeled using a first-degree fractional polynomial (FP1), the partial predictor is given by \hat{\eta}_j = \hat{\beta}_0 + x_j^* \hat{\beta}_j, where x_j^* is the transformed variable (centered if center = TRUE).

For a second-degree fractional polynomial (FP2), the partial predictor takes the form \hat{\eta}_j = \hat{\beta}_0 + x_{j1}^* \hat{\beta}_{j1} + x_{j2}^* \hat{\beta}_{j2}, where x_{j1}^* and x_{j2}^* are the two transformed components of the original variable (again, centered if center = TRUE).

This functionality is particularly useful for visualizing the functional relationship of a continuous variable, or for assessing model fit when residuals are included. See also fracplot().

Contrasts

If type = "contrasts", this function computes contrasts relative to a specified reference value for the jth variable (e.g., age = 50). Let x_j denote the values of the jth variable in newdata, and x_j^{\text{ref}} the reference value. The contrast is defined as the difference between the partial linear predictor evaluated at the transformed (and centered, if center = TRUE) value x_j, and that evaluated at the transformed reference value x_j^{(\text{ref}}), i.e., f(x_j^*) - f(x_j^{*(\text{ref})}).

For a first-degree fractional polynomial (FP1), the partial predictor is:

\hat{f}(x_j^*) = \hat{\beta}_0 + x_j^* \hat{\beta}_j

and the contrast is:

\hat{f}(x_j^*) - \hat{f}(x_j^{*(\text{ref})}) = x_j^* \hat{\beta}_j - x_j^{*(\text{ref})} \hat{\beta}_j

For a second-degree fractional polynomial (FP2), the partial predictor is:

\hat{f}(x_j^*) = \hat{\beta}_0 + x_{j1}^* \hat{\beta}_{j1} + x_{j2}^* \hat{\beta}_{j2}

and the contrast is:

\hat{f}(x_j^*) - \hat{f}(x_j^{*(\text{ref})}) = x_{j1}^* \hat{\beta}_{j1} + x_{j2}^* \hat{\beta}_{j2} - x_{j1}^{*(\text{ref})} \hat{\beta}_{j1} - x_{j2}^{*(\text{ref})} \hat{\beta}_{j2}

where x_j^*, x_{j1}^*, and x_{j2}^* are the transformed (and centered, if applicable) components of the jth variable, and the \hat{\beta} terms are the corresponding model estimates

The reference value x_j^{(\text{ref})} is first shifted using the same shifting factor estimated from the training data, then transformed using the estimated fractional polynomial (FP) powers, and finally centered (if center = TRUE) using the mean of the transformed (and shifted) values of x_j in the training data—ensuring full consistency with the fitted model.

If ref = NULL, the function uses the mean of the shifted x_j as the reference value when x_j is continuous, or the minimum of x_j (typically 0) when x_j is binary. This provides a natural and interpretable baseline in the absence of a user-specified reference.

The fitted partial predictors are centered at the reference point, meaning the contrast at that point is zero. Correspondingly, confidence intervals at the reference value have zero width, reflecting no contrast.

This functionality is especially useful for comparing the effect of a variable relative to a meaningful baseline, such as clinically relevant value.

See Also

mfp2(), stats::predict.glm(), survival::predict.coxph()

Examples


# Gaussian model
data("prostate")
x = as.matrix(prostate[,2:8])
y = as.numeric(prostate$lpsa)
# default interface
fit1 = mfp2(x, y, verbose = FALSE)
predict(fit1) # make predictions


mfp2 documentation built on June 8, 2025, 11:04 a.m.