sema_fit_df: Fit multilevel models in a data stream III

Description Usage Arguments Details Value Examples

Description

Fit multilevel models online on a data set

Usage

1
2
3
4
5
sema_fit_df(formula, data_frame = data.frame(), intercept = FALSE,
  print_every = NA, store_every = NA, start_resid_var = 1,
  start_random_var = 1, start_fixed_coef = 1:5, start_cor = 0.15,
  update = NULL, train = NULL, threshold = 1e-04, max_iter = 800,
  prior_n = 0, prior_j = 0)

Arguments

formula

A symbolic representation of the model, formula is used similar to lme4's lmer: response ~ fixed effects + (random effects | grouping variable)

data_frame

A data frame consisting of the variables mentioned in the formula.

intercept

This indicates whether there is a column in data frame with 1's.

print_every

Do you want the results printed to the consule? The default is NA, meaning no printing, if a number is privided the function prints a summary of the model every 'print_every' data points.

store_every

Do you want to store results during the data stream? The default is NA, i.e., no results are stored, if a number is privided the function stores the fixed effects, random effects variance and residual variance in seperate data frames every 'store_every' data points.

start_resid_var

This is optional if the user wants to provide a start value of the residual variance, default start value is 1.

start_random_var

This is optional if the user wants to provide a start values of the variance of the random effects covariates, default start value is 1. NOTE, if start values are provided make sure that the length of the vector of start values matches the number of random effects.

start_fixed_coef

This is optional if the user wants to provide start values of the fixed effects, default is set to NULL such that sema_fit_one can create the vector of start values matching the number of fixed effects. NOTE, if start values are provided make sure that the length of the vector of start values matches the number of fixed effects.

start_cor

This is a starting value for the correlations between the random effects.

update

The default is NULL, when an integer is provided sema_update is called to do a full update to recompute all contributions to the complete data suffient statistics.

train

The default value is NULL, meaning that there SEMA is fit to the data without a training set. When a different value is provided, this indicate the first number of rows which are used for the training set. See emAlgorithm for a full description of training SEMA.

threshold

In case of a training set, this thresholds determines when the EM algorithm should terminate. When the parameter estimates change less than this threshold, EM algorithm terminates.

max_iter

In case of a training set, you can fix the number of iterations of the EM algorithm.

prior_n

If starting values are provided, prior_n determines the weight of the starting value of the residual variance, default is 0.

prior_j

If starting values are provided, prior_j determins the weight of the starting value of the variance of the random effects and the fixed effects, default is 0.

Details

This function fits the multilevel models online, or row-by-row on a data set. Similar to sema_fit_set and sema_fit_one the algorithm updates the model parameters a data point at a time. However, instead of these two functions, this function fits the multilevel model on a data set and it uses formula.

Value

A list with updated global parameters (model), a list with lists of all units parameters and contributions (unit), if store_every is a number 3 data frames fixed_coef_df, random_var_df, resid_var_df.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## First we create a dataset, consisting of 2500 observations from 20
## units. The fixed effects have the coefficients 1, 2, 3, 4, and 5. The
## variance of the random effects equals 1, 4, and 9. Lastly the
## residual variance equals 4:
test_data <- build_dataset(n = 1500,
                           j = 200,
                           fixed_coef = 1:5,
                           random_coef_sd = 1:3,
                           resid_sd = 2)

## fit a multilevel model:
m1 <- sema_fit_df(formula = y ~ 1 + V3 + V4 + V5 + V6 + (1 + V4 + V5  | id),
                   data_frame = test_data, intercept = TRUE)

L-Ippel/SEMA documentation built on May 30, 2019, 8:23 a.m.