Home

/

GitHub

/

ntentes/simfast

/

simfast: Fitting isotonic generalized single-index regression models...

simfast: Fitting isotonic generalized single-index regression models...
In ntentes/simfast: Isotonic Single-Index Regression Models with Generalized Link Functions

View source: R/simfast.R

simfast

R Documentation

Fitting isotonic generalized single-index regression models via maximum likelihood with formula support

Description

Fitting isotonic generalized single-index regression models via maximum likelihood with support for estimating response values with predict and plotting values with plot. Also includes support for formula objects, data frames, and built-in regression families (see Arguments).

Usage

simfast(
  formula,
  data,
  intercept = FALSE,
  weights = NULL,
  offset = NULL,
  family = "gaussian",
  returnmodel = TRUE,
  returndata = TRUE,
  method = "stochastic",
  multiout = FALSE,
  B = 10000,
  k = 100,
  kappa0 = 100,
  tol = 1e-10,
  max.iter = 20
)

Arguments

`formula`	an object of class `formula`, which is a symbolic description of the model to be fitted. By default, intercepts are NOT included, so change argument `intercept = TRUE` to include one. When including categorical predictors, be sure to set `options('contrasts')` in your global options to a desired setting. For `'binomial'` response, the vector can be binary values or a vector of proportions, but should include proper weight vector (a vector of the denominators of the proportions) is provided. Categorical vectors (character strings or factors) will automatically be translated into a logical vector with the baseline factor level a 'success' (takes value `1`). Poisson responses can be integer counts or rates, but should include a proper weight vector (a vector of the denominators of the rates). Offsets can also be specified in the formula. Note that multiple offsets are combined, and that duplicate offsets are only counted once.
`data`	optional data frame (or object coercible to a data frame by `as.data.frame`) containing the variables in the model. Variables are taken from `environment(formula)` if not found in `data`.
`intercept`	logical value, if `FALSE` (the default value), then the model given by the formula does not include an intercept value (even when including a 1, for example: `z ~ 1 + x + y` will only include columns for `x` and `y`).
`weights`	optional vector of positive integer weights, with length `n`. Takes default value `NULL` which uses equal weights.
`offset`	numeric vector of model offsets , with length `n`. Takes default value `NULL` which uses no offset. If an offset is provided here and in the formula, they are combined.
`family`	a choice of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. Currently supporting any of `gaussian, binomial, poisson,` and `Gamma`. The canonical link function is used by default, but all link functions available for these families are supported.
`returnmodel`	logical value that when `TRUE` (the default value) attaches the `model.frame` object to the simfast object. Leave as `TRUE` to properly use the `predict` function.
`returndata`	logical value that when `TRUE` (the default value) returns the predictor matrix and response vector in the simfast object.
`method`	when `x` has `d=2` columns, method can take `'exact'` argument, which uses an exact optimization method instead of a stochastic search. If `d` does not equal `2`, `simfast` will give a warning and automatically continue with a stochastic search (the default method, `method = 'stochastic'`).
`multiout`	logical value, if `TRUE`, will return more than one `alpha` vector and `yhat` vector if available, separately from the main estimate (see Value section and Details).
`B`	positive integer, sets number of index vectors to try when maximizing the likelihood
`k`	positive integer, algorithmic parameter, more info coming, should be less than `B`
`kappa0`	positive integer, initial value of kappa, more info coming
`tol`	numeric, sets tolerance for convergence for `method = 'stochastic'`. Will give value of `0` if `'exact'` is used.
`max.iter`	positive integer limiting number of iterations for `method = 'stochastic'`

Details

For i=1,...,n, let X_i be the d-dimensional covariates and Y_i be the corresponding one-dimensional response. The isotonic single index model is written as

g(mu) = f(a^T x),

where x=(x_1,...,x_d)^T, g is a known link function, a is an d x 1 index vector, and f is a nondecreasing function. The algorithm finds the maximum likelihood estimate of both f and a, assuming that f is an increasing function. Implementaton details can be found in ADD REFs, where theoretical justification of our estimator (i.e. uniform consistency) is also given. For the identifiability of isotonic single index models, we refer to REFs.

Value

an object of class simfast, with the following structure:

x: if returndata = TRUE, this is the model matrix used to fit the model, otherwise it is NULL.
y: if returndata = TRUE, this is the response vector used to fit the model, otherwise it is NULL.
alphahat: alpha value estimated by the model fit
yhat: vector of estimated response values
indexvals: vector of estimated single index values, the matrix product of x and alphahat
weights: vector of the integer weights used in the model fit
family: the family function provided to simfast_m
loglik: a numeric value of the log-likelihood at the estimate.
offset: a numeric vector specifying the offset provided in the model formula.
tol: numeric convergence tolerance acheived during fitting with method = 'stochastic'. For method = 'exact', this is 0.
iter: number of iterations used to acheieve convergence. For method = 'exact', this is 1.
method: method used for fitting the model
model: the model.frame generated by the formula object which is used to generate the model.matrix and model.response to pass to simfast_m
intercept: the intercept rule selected in the argument
multialphahat: returns all estimated alphahat vectors if multiout = TRUE as a matrix if there is more than one, and as a vector if there is only one.
multiyhat: returns all estimated yhat vectors if multiout = TRUE as a matrix if there is more than one, and as a vector if there is only one.

Author(s)

Hanna Jankowski: hkj@yorku.ca
Konstantinos Ntentes: kntentes@yorku.ca (maintainer)

Examples


## Load esophageal cancer dataset
esoph <- datasets::esoph
str(esoph) # note that three variables are ordered factors
esoph$ntotal <- esoph$ncases + esoph$ncontrols #use as offset

## subset the data frame for training
set.seed(1) # keep from getting data OOB warning in predict()
nobs <- NROW(esoph)
ind <- sample(1:nobs, size = round(nobs * 0.8))
esophtrain <- esoph[ind, ]
esophtest  <- esoph[-ind, ]

## fit a model with formulas, including ordered/regular factors
## and support for offsets. similar syntax to glm()
sfobj <- simfast(ncases ~ offset(log(ntotal)) + tobgp + alcgp + agegp,
                 data = esophtrain, family = poisson(link = 'log'))

glmobj <- glm(ncases ~ offset(log(ntotal)) + tobgp + alcgp + agegp,
              data = esophtrain, family = poisson(link = 'log'))

## Plot the relationship of estimated responses vs. index values
# Not isotonic because of offset
plot(sfobj)
# Y-hats adjusted to same scale


plot(sfobj, offset = FALSE)

## Predictions from simfast and glm rounded to nearest integer
sfpred <- round(predict(sfobj, newdata = esophtest))
# Note that simfast only predicts 'response' values
sfpred
glmpred <- round(predict(glmobj, newdata = esophtest, type = 'response'))
glmpred

## Compare squared residuals
sum((sfpred - esophtest$ncases)^2)   #simfast prediction
sum((glmpred - esophtest$ncases)^2)  #glm prediction

ntentes/simfast documentation built on April 24, 2023, 10:10 p.m.

ntentes/simfast index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ntentes/simfast
Isotonic Single-Index Regression Models with Generalized Link Functions

simfast: Fitting isotonic generalized single-index regression models...
In ntentes/simfast: Isotonic Single-Index Regression Models with Generalized Link Functions

Fitting isotonic generalized single-index regression models via maximum likelihood with formula support

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to simfast in ntentes/simfast...

R Package Documentation

Browse R Packages

We want your feedback!

ntentes/simfast Isotonic Single-Index Regression Models with Generalized Link Functions

simfast: Fitting isotonic generalized single-index regression models... In ntentes/simfast: Isotonic Single-Index Regression Models with Generalized Link Functions

Fitting isotonic generalized single-index regression models via maximum likelihood with formula support

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to simfast in ntentes/simfast...

R Package Documentation

Browse R Packages

We want your feedback!

ntentes/simfast
Isotonic Single-Index Regression Models with Generalized Link Functions

simfast: Fitting isotonic generalized single-index regression models...
In ntentes/simfast: Isotonic Single-Index Regression Models with Generalized Link Functions