simfast: Fitting isotonic generalized single-index regression models...

View source: R/simfast.R

simfastR Documentation

Fitting isotonic generalized single-index regression models via maximum likelihood with formula support

Description

Fitting isotonic generalized single-index regression models via maximum likelihood with support for estimating response values with predict and plotting values with plot. Also includes support for formula objects, data frames, and built-in regression families (see Arguments).

Usage

simfast(
  formula,
  data,
  intercept = FALSE,
  weights = NULL,
  offset = NULL,
  family = "gaussian",
  returnmodel = TRUE,
  returndata = TRUE,
  method = "stochastic",
  multiout = FALSE,
  B = 10000,
  k = 100,
  kappa0 = 100,
  tol = 1e-10,
  max.iter = 20
)

Arguments

formula

an object of class formula, which is a symbolic description of the model to be fitted. By default, intercepts are NOT included, so change argument intercept = TRUE to include one. When including categorical predictors, be sure to set options('contrasts') in your global options to a desired setting. For 'binomial' response, the vector can be binary values or a vector of proportions, but should include proper weight vector (a vector of the denominators of the proportions) is provided. Categorical vectors (character strings or factors) will automatically be translated into a logical vector with the baseline factor level a 'success' (takes value 1). Poisson responses can be integer counts or rates, but should include a proper weight vector (a vector of the denominators of the rates). Offsets can also be specified in the formula. Note that multiple offsets are combined, and that duplicate offsets are only counted once.

data

optional data frame (or object coercible to a data frame by as.data.frame) containing the variables in the model. Variables are taken from environment(formula) if not found in data.

intercept

logical value, if FALSE (the default value), then the model given by the formula does not include an intercept value (even when including a 1, for example: z ~ 1 + x + y will only include columns for x and y).

weights

optional vector of positive integer weights, with length n. Takes default value NULL which uses equal weights.

offset

numeric vector of model offsets , with length n. Takes default value NULL which uses no offset. If an offset is provided here and in the formula, they are combined.

family

a choice of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. Currently supporting any of gaussian, binomial, poisson, and Gamma. The canonical link function is used by default, but all link functions available for these families are supported.

returnmodel

logical value that when TRUE (the default value) attaches the model.frame object to the simfast object. Leave as TRUE to properly use the predict function.

returndata

logical value that when TRUE (the default value) returns the predictor matrix and response vector in the simfast object.

method

when x has d=2 columns, method can take 'exact' argument, which uses an exact optimization method instead of a stochastic search. If d does not equal 2, simfast will give a warning and automatically continue with a stochastic search (the default method, method = 'stochastic').

multiout

logical value, if TRUE, will return more than one alpha vector and yhat vector if available, separately from the main estimate (see Value section and Details).

B

positive integer, sets number of index vectors to try when maximizing the likelihood

k

positive integer, algorithmic parameter, more info coming, should be less than B

kappa0

positive integer, initial value of kappa, more info coming

tol

numeric, sets tolerance for convergence for method = 'stochastic'. Will give value of 0 if 'exact' is used.

max.iter

positive integer limiting number of iterations for method = 'stochastic'

Details

For i=1,...,n, let X_i be the d-dimensional covariates and Y_i be the corresponding one-dimensional response. The isotonic single index model is written as

g(mu) = f(a^T x),

where x=(x_1,...,x_d)^T, g is a known link function, a is an d x 1 index vector, and f is a nondecreasing function. The algorithm finds the maximum likelihood estimate of both f and a, assuming that f is an increasing function. Implementaton details can be found in ADD REFs, where theoretical justification of our estimator (i.e. uniform consistency) is also given. For the identifiability of isotonic single index models, we refer to REFs.

Value

an object of class simfast, with the following structure:

x

if returndata = TRUE, this is the model matrix used to fit the model, otherwise it is NULL.

y

if returndata = TRUE, this is the response vector used to fit the model, otherwise it is NULL.

alphahat

alpha value estimated by the model fit

yhat

vector of estimated response values

indexvals

vector of estimated single index values, the matrix product of x and alphahat

weights

vector of the integer weights used in the model fit

family

the family function provided to simfast_m

loglik

a numeric value of the log-likelihood at the estimate.

offset

a numeric vector specifying the offset provided in the model formula.

tol

numeric convergence tolerance acheived during fitting with method = 'stochastic'. For method = 'exact', this is 0.

iter

number of iterations used to acheieve convergence. For method = 'exact', this is 1.

method

method used for fitting the model

model

the model.frame generated by the formula object which is used to generate the model.matrix and model.response to pass to simfast_m

intercept

the intercept rule selected in the argument

multialphahat

returns all estimated alphahat vectors if multiout = TRUE as a matrix if there is more than one, and as a vector if there is only one.

multiyhat

returns all estimated yhat vectors if multiout = TRUE as a matrix if there is more than one, and as a vector if there is only one.

Author(s)

Hanna Jankowski: hkj@yorku.ca
Konstantinos Ntentes: kntentes@yorku.ca (maintainer)

See Also

simfast_m for providing model matrices instead of a formula, as well as more examples.

Examples


## Load esophageal cancer dataset
esoph <- datasets::esoph
str(esoph) # note that three variables are ordered factors
esoph$ntotal <- esoph$ncases + esoph$ncontrols #use as offset

## subset the data frame for training
set.seed(1) # keep from getting data OOB warning in predict()
nobs <- NROW(esoph)
ind <- sample(1:nobs, size = round(nobs * 0.8))
esophtrain <- esoph[ind, ]
esophtest  <- esoph[-ind, ]

## fit a model with formulas, including ordered/regular factors
## and support for offsets. similar syntax to glm()
sfobj <- simfast(ncases ~ offset(log(ntotal)) + tobgp + alcgp + agegp,
                 data = esophtrain, family = poisson(link = 'log'))

glmobj <- glm(ncases ~ offset(log(ntotal)) + tobgp + alcgp + agegp,
              data = esophtrain, family = poisson(link = 'log'))

## Plot the relationship of estimated responses vs. index values
# Not isotonic because of offset
plot(sfobj)
# Y-hats adjusted to same scale


plot(sfobj, offset = FALSE)

## Predictions from simfast and glm rounded to nearest integer
sfpred <- round(predict(sfobj, newdata = esophtest))
# Note that simfast only predicts 'response' values
sfpred
glmpred <- round(predict(glmobj, newdata = esophtest, type = 'response'))
glmpred

## Compare squared residuals
sum((sfpred - esophtest$ncases)^2)   #simfast prediction
sum((glmpred - esophtest$ncases)^2)  #glm prediction



ntentes/simfast documentation built on April 24, 2023, 10:10 p.m.