Description Usage Arguments Details Value Author(s) References See Also Examples

Perform the modified EM algorithm imputation on a normal multivariate dataset

1 2 3 4 5 6 | ```
mnimput(formula, dataset, by = NULL, log = FALSE, log.offset = 1,
eps = 1e-3, maxit = 1e2, ts = TRUE, method = "spline",
sp.control = list(df = NULL, weights = NULL), ar.control =
list(order = NULL, period = NULL), ga.control = list(formula,
weights = NULL), f.eps = 1e-6, f.maxit = 1e3, ga.bf.eps = 1e-6,
ga.bf.maxit = 1e3, verbose = FALSE, digits = getOption("digits"))
``` |

`formula` |
formula indicating the missing data frame, for instance, |

`dataset` |
data with missing values to be imputated |

`by` |
factor for variance windows. Default is |

`log` |
logical. If |

`log.offset` |
If |

`eps` |
stop criterion |

`maxit` |
maximum number of iterations |

`ts` |
logical. |

`method` |
method for univariate time series filtering. It may be |

`sp.control` |
list for Spline smooth control. See Details |

`ar.control` |
list for ARIMA fitting control. See Details |

`ga.control` |
list for GAM fitting control. See Details |

`f.eps` |
convergence criterion for the ARIMA filter. See |

`f.maxit` |
maximum number of iterations for the ARIMA filter. See |

`ga.bf.eps` |
covergence criterion for the backfitting algorithm of GAM models. See |

`ga.bf.maxit` |
maximum number of iterations for the backfitting algorithm of GAM models. See |

`verbose` |
if |

`digits` |
an integer indicating the decimal places. If not supplied, it is taken from |

This is a modified version of the EM algorithm for imputation of missing values. It is also applicable to time series data. When it is explicited the time series attribute through the argument `ts`

, missing values are estimated accounting for both correlation between time series and time structure of the series itself. Several filters can be used for prediction of the mean vector in the E-step.

One can select the method for the univariate time series filtering by the argument `method`

. The default method is `"spline"`

. In this case a smooth spline is fitted to each of the time series at each iteration. Some parameters can be passed to `smooth.spline`

through `sm.control`

. `df`

is a vector as long as the number of columns in `dataset`

holding fixed degrees of freedom of the splines. If `NULL`

, the degrees of freedom of each spline are chosen by cross-validation. If `df`

has length 1, this values is recycled for all the covariates. `weights`

must be a matrix of the same size of `dataset`

with the weights for `smooth.spline`

. If `NULL`

, all the observations will have weights equal to *1*.

Other possibity for time series filtering is to fitting an ARIMA model for each of the time series by setting `method`

to `"arima"`

. The ARIMA models must be identified before using this function, nonetheless. `arima`

function can be partially controlled through `ar.control`

. Each column of `order`

must hold the corresponding *(p,d,q)* parameters for each univariate time series if `period`

is `NULL`

. If `period`

is not `NULL`

, `order`

must also hold the multiplicative seasonality parameters, so each column of `order`

takes the form *(p,d,q,P,D,Q)*. `period`

is the multiplicative seasonality period. `f.eps`

and `f.maxit`

control de convergence of the ARIMA fitting algorithm. Convergence problems due non stationarity may arise when using this option.

Last but not least, a very interesting approach to modelling temporal patterns to use a full fledged regression model. It is possible to use generalised aditive (or linear) models with exogenous variates to proper filtering of time patterns. One must set method to `gam`

and supply a vector of formulas in `ga.control`

. One must supply one formula for each covariate. Using covariates that are part of the formula of the imputation model may yield some colinearity among the variates. See `gam`

and `glm`

for details. In order to use regression models for the level, set `method`

to `"gam"`

Simulations have shown that the algorithm is stable and yields good results on imputation of normal data.

The function returns an object of class `mtsdi`

containing

`call` |
function call |

`dataset` |
imputed dataset |

`muhat` |
estimated mean vector |

`sigmahat` |
estimated covariance matrix |

`missings` |
vector holding the number of missing values on each row |

`iterations` |
number of iterations until convergence or reach |

`convergence` |
convergence value. See Details |

`converged` |
a logical indicating if the algorithm converged |

`time` |
elapsed time of the process |

Washington Junger wjunger@ims.uerj.br and Antonio Ponce de Leon ponce@ims.uerj.br

Junger, W.L. and Ponce de Leon, A. (2015) Imputation of Missing Data in Time Series for Air Pollutants. Atmospheric Environment, 102, 96-104.

Johnson, R., Wichern, D. (1998) *Applied Multivariate Statistical Analysis*. Prentice Hall.

Dempster, A., Laird, N., Rubin, D. (1977) Maximum Likelihood from Incomplete Data via the Algorithm EM. *Journal of the Royal Statistical Society* 39(B)), 1–38.

McLachlan, G. J., Krishnan, T. (1997) *The EM algorithm and extensions*. John Wiley and Sons.

Box, G., Jenkins, G., Reinsel, G. (1994) *Time Series Analysis: Forecasting and Control*. 3 ed. Prentice Hall.

Hastie, T. J.; Tibshirani, R. J. (1990) *Generalized Additive Models*. Chapman and Hall.

`mnimput`

, `predict.mtsdi`

, `edaprep`

1 2 3 4 5 6 7 8 9 10 | ```
data(miss)
f <- ~c31+c32+c33+c34+c35
## one-window covariance
i <- mnimput(f,miss,eps=1e-3,ts=TRUE, method="spline",sp.control=list(df=c(7,7,7,7,7)))
summary(i)
## two-window covariances
b<-c(rep("year1",12),rep("year2",12))
ii <- mnimput(f,miss,by=b,eps=1e-3,ts=TRUE, method="spline",sp.control=list(df=c(7,7,7,7,7)))
summary(ii)
``` |

```
Loading required package: gam
Loading required package: splines
Loading required package: foreach
Loaded gam 1.16
mtsdi 0.3.5
Call:
mnimput(formula = f, dataset = miss, eps = 0.001, ts = TRUE,
method = "spline", sp.control = list(df = c(7, 7, 7, 7, 7)))
Estimated mean vector:
c31 c32 c33 c34 c35
4.428260 3.404179 3.604979 4.111231 4.231927
Estimated covariance matrix:
c31 c32 c33 c34 c35
c31 13.198835 9.821802 9.880967 12.48171 15.00930
c32 9.821802 10.565439 13.290885 12.54823 13.24290
c33 9.880967 13.290885 23.171432 15.31136 18.71416
c34 12.481710 12.548226 15.311356 16.06571 16.86641
c35 15.009299 13.242895 18.714161 16.86641 23.27084
Data are on the original scale.
The algorithm converged after 9 iterations with relative diference in covariance matrix equal to 0.0007446.
The process took 00:00:00.
Time filtering models:
Filter model for variate: c31
Call:
smooth.spline(x = t, y = xn[, j], w = w[, j], df = df[j])
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 146.3098
GCV: 12.15173
Filter model for variate: c32
Call:
smooth.spline(x = t, y = xn[, j], w = w[, j], df = df[j])
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 115.0795
GCV: 9.557904
Filter model for variate: c33
Call:
smooth.spline(x = t, y = xn[, j], w = w[, j], df = df[j])
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 214.1316
GCV: 17.78465
Filter model for variate: c34
Call:
smooth.spline(x = t, y = xn[, j], w = w[, j], df = df[j])
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 166.8908
GCV: 13.86108
Filter model for variate: c35
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 278.771
GCV: 23.15326
Call:
mnimput(formula = f, dataset = miss, by = b, eps = 0.001, ts = TRUE,
method = "spline", sp.control = list(df = c(7, 7, 7, 7, 7)))
Estimated mean vector:
Covariance window id: year1
c31 c32 c33 c34 c35
4.439354 3.058333 2.943321 4.332500 4.282500
Covariance window id: year2
c31 c32 c33 c34 c35
4.413333 3.772190 3.107810 3.900743 4.054827
Estimated covariance matrix:
BY factor level: year1
c31 c32 c33 c34 c35
c31 17.06912 12.75444 14.12697 17.35886 22.27431
c32 12.75444 11.35351 11.27189 14.98227 15.55464
c33 14.12697 11.27189 12.79017 15.12799 18.09330
c34 17.35886 14.98227 15.12799 20.27335 21.51136
c35 22.27431 15.55464 18.09330 21.51136 31.15114
BY factor level: year2
c31 c32 c33 c34 c35
c31 9.1459222 6.956757 0.8428413 7.089435 6.179044
c32 6.9567566 9.612193 14.3186661 10.589023 11.390653
c33 0.8428413 14.318666 46.1144209 16.362282 22.125969
c34 7.0894347 10.589023 16.3622818 12.706236 13.246699
c35 6.1790439 11.390653 22.1259691 13.246699 15.542236
Data are on the original scale.
The algorithm converged after 53 iterations with relative diference in covariance matrix equal to 0.0008958.
The process took 00:00:00.
Time filtering models:
Filter model for variate: c31
Call:
smooth.spline(x = t, y = xn[, j], w = w[, j], df = df[j])
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 146.2754
GCV: 12.14887
Filter model for variate: c32
Call:
smooth.spline(x = t, y = xn[, j], w = w[, j], df = df[j])
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 115.6142
GCV: 9.602314
Filter model for variate: c33
Call:
smooth.spline(x = t, y = xn[, j], w = w[, j], df = df[j])
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 238.2249
GCV: 19.78572
Filter model for variate: c34
Call:
smooth.spline(x = t, y = xn[, j], w = w[, j], df = df[j])
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 161.1101
GCV: 13.38096
Filter model for variate: c35
Smoothing Parameter spar= 0.5179712 lambda= 0.0003185393 (12 iterations)
Equivalent Degrees of Freedom (Df): 7.001002
Penalized Criterion (weighted RSS): 247.7141
GCV: 20.57384
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.