add_probs: Add Regression Probabilities to Data Frames

View source: R/add_probs.R

add_probsR Documentation

Add Regression Probabilities to Data Frames

Description

This is a generic function to append response level probabilities to a data frame. A response level probability (conditioned on the model and covariates), such as Pr(Response|Covariates < 10), is generated for the fitted value of each observation in df. These probabilities are then appended to df and returned to the user as a data frame.

Usage

add_probs(df, fit, q, name = NULL, yhatName = "pred", comparison, ...)

Arguments

df

A data frame of new data.

fit

An object of class lm, glm, or lmerMod. Predictions are made with this object.

q

A real number. A quantile of the conditional response distribution.

name

NULL or character vector of length one. If NULL, probabilities automatically will be named by add_probs, otherwise, the probabilities will be named name in the returned data frame.

yhatName

A character vector of length one. Names of the

comparison

A string. If comparison = "<", then Pr(Y|x < q) is calculated for each observation in df. Default is "<". Must be "<" or ">" for objects of class lm or lmerMod. If fit is a glm, then comparison also may be "<=" , ">=" , or "=".

...

Additional arguments

Details

For more specific information about the arguments that are useful in each method, consult:

  • add_probs.lm for linear regression response probabilities

  • add_probs.glm for generalized linear regression response probabilities

  • add_probs.lmerMod for linear mixed models response probabilities

  • add_probs.glmerMod for generalized linear mixed model response probabilities

  • add_probs.survreg for accelerated failure time model response probabilities

Note: Except in add_probs.survreg, the probabilities calculated by add_probs are on the distribution of Y|x, not E[Y|x]. That is, they use the same distribution from which a prediction interval is determined, not the distribution that determines a confidence interval. add_probs.survreg is an exception to this pattern so that users of accelerated failure time models can obtain estimates of the survivor function.

Value

A dataframe, df, with predicted values and probabilities attached.

See Also

add_ci for confidence intervals, add_quantile for response level quantiles, and add_pi for prediction intervals.

Examples

# Define a model
fit <- lm(dist ~ speed, data = cars)

# Calculate the probability that the probability that a new
# dist is less than 20 (Given the model).
add_probs(cars, fit, q = 20)

# Calculate the probability that a new
# dist is greater than 20 (Given the model).
add_probs(cars, fit, q = 20, comparison = ">")

# Try a different model fit.
fit2 <- glm(dist ~ speed, family = "poisson", data = cars)
add_probs(cars, fit2, q = 20)

# Try a different model fit.
fit3 <- lme4::lmer(Reaction ~ Days + (1|Subject), data = lme4::sleepstudy)
add_probs(lme4::sleepstudy, fit3, q = 300, type = "parametric")

# As above, but do not condition on the random effects.
add_probs(lme4::sleepstudy, fit3, q = 300, type = "parametric", includeRanef = FALSE)


jthaman/ciTools documentation built on Nov. 11, 2023, 2:04 p.m.