logis_fe | R Documentation |
Fit a fixed effect logistic model via Serial blockwise inversion Newton (SerBIN) or block ascent Newton (BAN) algorithm.
logis_fe(
formula = NULL,
data = NULL,
Y.char = NULL,
Z.char = NULL,
ID.char = NULL,
Y = NULL,
Z = NULL,
ID = NULL,
method = "SerBIN",
max.iter = 1000,
tol = 1e-05,
bound = 10,
cutoff = 10,
backtrack = TRUE,
stop = "or",
threads = 1,
message = TRUE
)
formula |
a two-sided formula object describing the model to be fitted,
with the response variable on the left of a ~ operator and covariates on the right,
separated by + operators. The fixed effect of the provider identifier is specified using |
data |
a data frame containing the variables named in the |
Y.char |
a character string specifying the column name of the response variable in the |
Z.char |
a character vector specifying the column names of the covariates in the |
ID.char |
a character string specifying the column name of the provider identifier in the |
Y |
a numeric vector representing the response variable. |
Z |
a matrix or data frame representing the covariates, which can include both numeric and categorical variables. |
ID |
a numeric vector representing the provider identifier. |
method |
a string specifying the algorithm to be used. The default value is "SerBIN".
|
max.iter |
maximum iteration number if the stopping criterion specified by |
tol |
tolerance used for stopping the algorithm. See details in |
bound |
a positive number to avoid inflation of provider effects. The default value is 10. |
cutoff |
An integer specifying the minimum number of observations required for providers.
Providers with fewer observations than the cutoff will be labeled as |
backtrack |
a Boolean indicating whether backtracking line search is implemented. The default is FALSE. |
stop |
a character string specifying the stopping rule to determine convergence.
The default value is |
threads |
a positive integer specifying the number of threads to be used. The default value is 1. |
message |
a Boolean indicating whether to print the progress of the fitting process. The default is TRUE. |
The function accepts three different input formats:
a formula and dataset, where the formula is of the form response ~ covariates + id(provider)
, with provider
representing the provider identifier;
a dataset along with the column names of the response, covariates, and provider identifier;
or the binary outcome vector \boldsymbol{Y}
, the covariate matrix or data frame \mathbf{Z}
, and the provider identifier vector.
The default algorithm is based on Serial blockwise inversion Newton (SerBIN) proposed by Wu et al. (2022), but users can also choose to use the block ascent Newton (BAN) algorithm proposed by He et al. (2013) to fit the model. Both methodologies build upon the Newton-Raphson method, yet SerBIN simultaneously updates both the provider effect and covariate coefficient. This concurrent update necessitates the inversion of the whole information matrix at each iteration. In contrast, BAN adopts a two-layer updating approach, where the covariate coefficient is sequentially fixed to update the provider effect, followed by fixing the provider effect to update the covariate coefficient.
We suggest using the default "SerBIN"
option as it typically converges subsequently much faster for most datasets.
However, in rare cases where the SerBIN algorithm encounters second-order derivative irreversibility leading to an error,
users can consider using the "BAN"
option as an alternative.
For a deeper understanding, please consult the original article for detailed insights.
If issues arise during model fitting, consider using the data_check
function to perform a data quality check,
which can help identify missing values, low variation in covariates, high-pairwise correlation, and multicollinearity.
For datasets with missing values, this function automatically removes observations (rows) with any missing values before fitting the model.
A list of objects with S3 class "logis_fe"
:
coefficient |
a list containing the estimated coefficients:
|
variance |
a list containing the variance estimates:
|
linear_pred |
the linear predictor of each individual. |
prediction |
predicted probability of each individual |
observation |
the original response of each individual. |
Loglkd |
the log-likelihood. |
AIC |
Akaike info criterion. |
BIC |
Bayesian info criterion. |
AUC |
area under the ROC curve. |
char_list |
a list of the character vectors representing the column names for the response variable, covariates, and provider identifier. For categorical variables, the names reflect the dummy variables created for each category. |
data_include |
the data used to fit the model, sorted by the provider identifier.
For categorical covariates, this includes the dummy variables created for
all categories except the reference level. Additionally, it contains three extra columns:
|
He K, Kalbfleisch, J, Li, Y, and et al. (2013) Evaluating hospital readmission rates in dialysis providers; adjusting for hospital effects.
Lifetime Data Analysis, 19: 490-512.
Wu, W, Yang, Y, Kang, J, He, K. (2022) Improving large-scale estimation and inference for profiling health care providers.
Statistics in Medicine, 41(15): 2840-2853.
data_check
data(ExampleDataBinary)
outcome <- ExampleDataBinary$Y
covar <- ExampleDataBinary$Z
ID <- ExampleDataBinary$ID
data <- data.frame(outcome, ID, covar)
covar.char <- colnames(covar)
outcome.char <- colnames(data)[1]
ID.char <- colnames(data)[2]
formula <- as.formula(paste("outcome ~", paste(covar.char, collapse = " + "), "+ id(ID)"))
# Fit logistic linear effect model using three input formats
fit_fe1 <- logis_fe(Y = outcome, Z = covar, ID = ID)
fit_fe2 <- logis_fe(data = data, Y.char = outcome.char, Z.char = covar.char, ID.char = ID.char)
fit_fe3 <- logis_fe(formula, data)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.