rewlr: Fiting the Rare Event Weighted Logistic Regression
In zaenalium/rewlr: Rare Event Weighted Logistics Regression

Description Usage Arguments Value References See Also Examples

View source: R/rewlr.R View source: R/rewlr_main.R

rewlr is used to fitting the Rare Event Weighted Logistic Regression to handle the imbalanced or unbalanced response variabel in binary classification

1 2	rewlr(formula, data, weights0, weights1, tol = 1e-04, iter = 1000, lambda = NULL)

`formula`	an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted
`data`	a dataframe or matrix (tibble is also supported)
`tol`	positive convergence tolerance ε; the iterations converge when \|dev - dev_old\|/(\|dev\| + 0.1) < ε'
`iter`	an integer that giving maximum iteration for parameter estimation.
`lambda`	a regularization (penalty) term to obtain better estimation. If the value is missing, lamda will be calculated by 1/sd(y)
`weight0`	(1 - proportion of events in the sample) devided by (1 - proportion of events in the population)
`weight1`	proportion of events in the sample devided by proportion of events in the population

rewlr returns output like glm, use function summary() to obtain the summary coefficients and others. The detail are shown in the following list:

coefficients - a named vector of coefficients.
fitted.values - return the prediction using the training data resulting probablity.
deviance - up to a constant, minus twice the maximized log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero.
AIC - A version of Akaike's An Information Criterion, minus twice the maximized log-likelihood plus twice the number of parameter.
null.deviance - The deviance for the null model, comparable with deviance. The null model will include the offset, and an intercept if there is one in the model.
df.residual - the residual degrees of freedom.
df.null - the residual degrees of freedom for the null model.
auc - an area under ROC curve

Maalouf M, Siddiqi M. (2014) emphWeight logistic regression for large-scale imbalanced and rare events data. emphKnowledge-Based System, strong59, 142-148.

summary.rewlr for summarises the model that has been built. Also use predict.rewlr to predict model to testing or new data.

library(rewlr)
data(National_exam_id)
#data$Species <- ifelse(data$Species == "setosa",0,1)
#Supposed that current sample data has 9 percent of rare event data, and the population has 2 percent of those rare event data.
(weight0 = (1 - 0.09)/(1-0.02))
(weight1 = (0.09)/(0.02))
iter = 1000; tol = 0.00001

fit <- rewlr(y~., data = National_exam_id, weights0 = weight0, weights1 = weight1)
summary(fit)
p <- predict(fit, newdata = National_exam_id)