mcnnm_wc_cv: This function computes the best model fitted to the data....
In susanathey/MCPanel: Matrix Completion algorithms for Causal Panel Data Models

Description Usage Arguments Value See Also

This function computes the best model fitted to the data. Best values of lambda_L and lambda_H are chosen via cross-validation using all observed entries. It creates some folds, divides the observed entry to training and validation on each fold, computes the best model on training sets and finds root mean squared error on validation sets. Finally, it chooses the model which gives the smallest average RMSE.

mcnnm_wc_cv(M, X, Z, mask, to_normalize = 1L, to_estimate_u = 1L,
  to_estimate_v = 1L, to_add_ID = 1L, num_lam_L = 30L, num_lam_H = 30L,
  niter = 100L, rel_tol = 1e-05, cv_ratio = 0.8, num_folds = 1L,
  is_quiet = 1L)

`M`	Matrix of observed entries. The input should be N (number of units) by T (number of time periods).
`X`	Matrix containing unit-related covariates. The number of rows of X should match with the number of units (number of rows of M). If unit-related covariates do not exist X = matrix(0L,0,0) should be used as input.
`Z`	Matrix containing time-related covariates. The number of rows of Z should match with the number of time periods (number of columns in M). If time-related covariates do not exist use Z = matrix(0L,0,0)
`mask`	Binary mask with the same shape as M containing observed entries.
`to_normalize`	Optional boolean parameter indicating whether to normalize covariates or not (columns of X and Z). The default value is 1. If this value is set to 0, the result would be sensitive to scales in covariates.
`to_estimate_u`	Optional boolean input for wheter estimating fixed unit effects (row means of M) or not. Default is 1.
`to_add_ID`	Optional boolean parameter indicating whether identity matrices are concatenated with X and Z in the model X * H * Z'. The default value is true (identity matrices are concatenated) and the model becomes XH_X + XH_XZ*Z^T+ H_Z Z^T (the rest of matrix in H forced to zero).
`num_lam_L`	Optional parameter on the number of lambda_Ls to consider for learning. The default number is 30 and lambda_L values are from minimum number which makes L zero to 1e-3 times this minimum number.
`num_lam_H`	Optional parameter on the number of lambda_Hs to consider for learning. The default number is 30 and lambda_H values are from minimum number which makes H zero to 1e-3 times this minimum number.
`niter`	Optional parameter on the number of iterations taken in the algorithm for each fixed value of lambda_L. The default value is 1000 and it is sufficiently large as the algorithm is using warm-start strategy.
`rel_tol`	Optional parameter on the stopping rule. Once the relative improve in objective value drops below rel_tol, execution is halted. Default value is 1e-5.
`cv_ratio`	Optional parameter indicating what percentage of observed entries to be used for training. 1-cv_ratio will be dedicated to validation set. For each fold these two sets are chosen randomly. Default value is 80/20 for training/validation.
`num_folds`	Optional parameter indicating the number of cross-validation folds. Default value is 3. For larger size problems we recommend decreasing this number for a faster cross-validation.
`is_quiet`	Optional boolean input which indicates whether to print the status of learning and convergence results for Cyclic Coordinate Descent algorithm or not. The default value is 1 (no output is printed).

The best model fitted using lambda_L and lambda_H chosen via cross-validation using all observed entries (not only training set). The output also includes the matrix of average root mean squared error for different values of lambda_L and lambda_H. examples mcnnm_wc_cv(M = replicate(5,rnorm(5)), X = replicate(3, rnorm(5)), Z = matrix(0L, 0, 0), mask = matrix(rbinom(5*5,1,0.8),5,5))

mcnnm_cv

susanathey/MCPanel documentation built on May 29, 2019, 9:51 a.m.