Description Usage Arguments Value Examples
The function works similar to the classical lm
but with special handling of NA
's. Whereas lm
usually
just ignores response value that are missing, pd_lm
applies
a probabilistic dropout model, that assumes that missing values
occur because of the dropout curve. The dropout curve describes for
each position the chance that that a value is missed. A negative
dropout_curve_scale
means that the lower the intensity was,
the more likely it is to miss the value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | pd_lm(
formula,
data = NULL,
subset = NULL,
dropout_curve_position,
dropout_curve_scale,
location_prior_mean = NULL,
location_prior_scale = NULL,
variance_prior_scale = NULL,
variance_prior_df = NULL,
location_prior_df = 3,
method = c("analytic_hessian", "analytic_grad", "numeric"),
verbose = FALSE
)
|
formula |
a formula that specifies a linear model |
data |
an optional data.frame whose columns can be used to
specify the |
subset |
an optional selection vector for data to subset it |
dropout_curve_position |
the value where the chance to observe a value is 50%. Can either be a single value that is repeated for each row or a vector with one element for each row. Not optional. |
dropout_curve_scale |
the width of the dropout curve. Smaller values mean that the sigmoidal curve is steeper. Can either be a single value that is repeated for each row or a vector with one element for each row. Not optional. |
location_prior_mean, location_prior_scale |
the optional mean and variance of the prior around which the predictions are supposed to scatter. If no value is provided no location regularization is applied. |
variance_prior_scale, variance_prior_df |
the optional scale and degrees of freedom of the variance prior. If no value is provided no variance regularization is applied. |
location_prior_df |
The degrees of freedom for the t-distribution of the location prior. If it is large (> 30) the prior is approximately Normal. Default: 3 |
method |
one of 'analytic_hessian', 'analytic_gradient', or
'numeric'. If 'analytic_hessian' the |
verbose |
boolean that signals if the method prints informative
messages. Default: |
a list with the following entries
a named vector with the fitted values
a p*p
matrix with the variance associated
with each coefficient estimate
the estimated "size" of the data set (n_hat - variance_prior_df)
the estimated degrees of freedom (n_hat - p)
the estimated unbiased variance
the number of response values that were not 'NA'
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # Without missing values
y <- rnorm(5, mean=20)
lm(y ~ 1)
pd_lm(y ~ 1,
dropout_curve_position = NA,
dropout_curve_scale = NA)
# With some missing values
y <- c(23, 21.4, NA)
lm(y ~ 1)
pd_lm(y ~ 1,
dropout_curve_position = 19,
dropout_curve_scale = -1)
# With only missing values
y <- c(NA, NA, NA)
# lm(y ~ 1) # Fails
pd_lm(y ~ 1,
dropout_curve_position = 19,
dropout_curve_scale = -1,
location_prior_mean = 21,
location_prior_scale = 3,
variance_prior_scale = 0.1,
variance_prior_df = 2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.