find_best_fp_step: Function to estimate the best FP functions for a single...

View source: R/mfp_step.R

find_best_fp_stepR Documentation

Function to estimate the best FP functions for a single variable

Description

See mfp2() for a brief summary on the notation used here and fit_mfp() for an overview of the fitting procedure.

Usage

find_best_fp_step(
  x,
  y,
  xi,
  weights,
  offset,
  df,
  powers_current,
  family,
  criterion,
  select,
  alpha,
  keep,
  powers,
  method,
  strata,
  nocenter,
  acdx,
  ftest,
  control,
  rownames,
  verbose
)

Arguments

x

an input matrix of dimensions nobs x nvars. Does not contain intercept, but columns are already expanded into dummy variables as necessary. Data are assumed to be shifted and scaled.

y

a vector for the response variable or a Surv object.

xi

a character string indicating the name of the current variable of interest, for which the best fractional polynomial transformation is to be estimated in the current step.

weights

a vector of observation weights of length nobs.

offset

a vector of length nobs of offsets.

df

a numeric vector indicating the maximum degrees of freedom for the variable of interest xi.

powers_current

a list of length equal to the number of variables, indicating the fp powers to be used in the current step for all variables (except xi).

family

a character string representing a family object.

criterion

a character string defining the criterion used to select variables and FP models of different degrees.

select

a numeric value indicating the significance level for backward elimination of xi.

alpha

a numeric value indicating the significance level for tests between FP models of different degrees for xi.

keep

a character vector with names of variables to be kept in the model.

powers

a named list of numeric values that sets the permitted FP powers for each covariate.

method

a character string specifying the method for tie handling in Cox regression.

strata

a factor of all possible combinations of stratification variables. Returned from survival::strata().

nocenter

a numeric vector with a list of values for fitting Cox models. See survival::coxph() for details.

acdx

a logical vector of length nvars indicating continuous variables to undergo the approximate cumulative distribution (ACD) transformation.

ftest

a logical indicating the use of the F-test for Gaussian models.

control

a list with parameters for model fit.

rownames

a parameter for Cox models.

verbose

a logical; run in verbose mode.

Details

The function selection procedure (FSP) is used if the p-value criterion is chosen, whereas the criteria AIC and BIC select the model with the smallest AIC and BIC, respectively.

It uses transformations for all other variables to assess the FP form of the current variable of interest. This function covers three main use cases:

  • the linear case (df = 1) to test between null and linear models (see select_linear()). This step differs from the mfp case because linear models only use 1 df, while estimation of (every) fp power adds another df. This is also the case applied for categorical variables for which df are set to 1.

  • the case that an acd transformation is requested (acdx is TRUE for xi) for the variable of interest (see find_best_fpm_step()).

  • the (usual) case of the normal mfp algorithm to assess non-linear functional forms (see find_best_fpm_step()).

Note that these cases do not encompass the setting that a variable is not selected, because the evaluation is done for each variable in each cycle. A variable which was de-selected in earlier cycles may be added to the working model again. Also see find_best_fp_cycle().

The adjustment in each step uses the current fp powers given in powers_current for all other variables to determine the adjustment set and transformations in the working model.

Note that the algorithm starts by setting all df = 1, and higher fps are evaluated in turn starting from the first step in the first cycle.

Value

A numeric vector indicating the best powers for xi. Entries can be NA if variable is to be removed from the working model. Note that this vector may include up to two NA entries when ACD transformation is requested, but otherwise is either a vector with all numeric entries, or a single NA.

Functional form selection

There are 3 criteria to decide for the current best functional form of a continuous variable.

The first option for criterion = "pvalue" is the function selection procedure as outlined in e.g. Chapters 4 and 6 of Royston and Sauerbrei (2008), also abbreviated as "RA2". It is a closed testing procedure and is implemented in select_ra2() and extended for ACD transformation in select_ra2_acd() according to Royston and Sauerbrei (2016).

For the other criteria aic and bic all FP models up to the desired degree are fitted and the model with the lowest value for the information criteria is chosen as the final one. This is implemented in select_ic().

References

Royston, P. and Sauerbrei, W., 2008. Multivariable Model - Building: A Pragmatic Approach to Regression Anaylsis based on Fractional Polynomials for Modelling Continuous Variables. John Wiley & Sons.

Royston, P. and Sauerbrei, W., 2016. mfpa: Extension of mfp using the ACD covariate transformation for enhanced parametric multivariable modeling. The Stata Journal, 16(1), pp.72-87.


mfp2 documentation built on Nov. 15, 2023, 1:06 a.m.