pd_lm | R Documentation |
The function works similar to the classical lm
but with special handling of NA
's. Whereas lm
usually
just ignores response value that are missing, pd_lm
applies
a probabilistic dropout model, that assumes that missing values
occur because of the dropout curve. The dropout curve describes for
each position the chance that that a value is missed. A negative
dropout_curve_scale
means that the lower the intensity was,
the more likely it is to miss the value.
pd_lm(
formula,
data = NULL,
subset = NULL,
dropout_curve_position,
dropout_curve_scale,
location_prior_mean = NULL,
location_prior_scale = NULL,
variance_prior_scale = NULL,
variance_prior_df = NULL,
location_prior_df = 3,
method = c("analytic_hessian", "analytic_grad", "numeric"),
verbose = FALSE
)
formula |
a formula that specifies a linear model |
data |
an optional data.frame whose columns can be used to
specify the |
subset |
an optional selection vector for data to subset it |
dropout_curve_position |
the value where the chance to observe a value is 50%. Can either be a single value that is repeated for each row or a vector with one element for each row. Not optional. |
dropout_curve_scale |
the width of the dropout curve. Smaller values mean that the sigmoidal curve is steeper. Can either be a single value that is repeated for each row or a vector with one element for each row. Not optional. |
location_prior_mean, location_prior_scale |
the optional mean and variance of the prior around which the predictions are supposed to scatter. If no value is provided no location regularization is applied. |
variance_prior_scale, variance_prior_df |
the optional scale and degrees of freedom of the variance prior. If no value is provided no variance regularization is applied. |
location_prior_df |
The degrees of freedom for the t-distribution of the location prior. If it is large (> 30) the prior is approximately Normal. Default: 3 |
method |
one of 'analytic_hessian', 'analytic_gradient', or
'numeric'. If 'analytic_hessian' the |
verbose |
boolean that signals if the method prints informative
messages. Default: |
a list with the following entries
a named vector with the fitted values
a p*p
matrix with the variance associated
with each coefficient estimate
the estimated "size" of the data set (n_hat - variance_prior_df)
the estimated degrees of freedom (n_hat - p)
the estimated unbiased variance
the number of response values that were not 'NA'
# Without missing values
y <- rnorm(5, mean=20)
lm(y ~ 1)
pd_lm(y ~ 1,
dropout_curve_position = NA,
dropout_curve_scale = NA)
# With some missing values
y <- c(23, 21.4, NA)
lm(y ~ 1)
pd_lm(y ~ 1,
dropout_curve_position = 19,
dropout_curve_scale = -1)
# With only missing values
y <- c(NA, NA, NA)
# lm(y ~ 1) # Fails
pd_lm(y ~ 1,
dropout_curve_position = 19,
dropout_curve_scale = -1,
location_prior_mean = 21,
location_prior_scale = 3,
variance_prior_scale = 0.1,
variance_prior_df = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.