Multiple Imputation Diagnostics In Amelia: A Program for Missing Data

Time-series Plots {#sec_tscsplots}

As discussed above, information about time trends and fixed effects can help produce better imputations. One way to check the plausibility of our imputation model is to see how it predicts missing values in a time series. If the imputations for the Malaysian tariff rate were drastically higher in 1990 than the observed years of 1989 or 1991, we might worry that there is a problem in our imputation model. Checking these time series is easy to do with tscsPlot(). Simply choose the variable (with the var argument) and the cross-section (with the cs argument) to plot the observed time-series along with distributions of the imputed values for each missing time period. For instance, we can get the plot of the tariff variable for Malaysia with the following commands:

a.out.time <- amelia(freetrade, ts = "year", cs = "country", polytime = 1,
intercs = TRUE, p2s = 0)
tscsPlot(a.out.time, cs = "Malaysia", main = "Malaysia (with time settings)",
var = "tariff", ylim = c(-10, 60))


Here, the black point are observed tariff rates for Malaysia from 1980 to 2000. The red points are the mean imputation for each of the missing values, along with their 95% confidence bands. We draw these bands by imputing each of missing values 100 times to get the imputation distribution for that observation.

In figure \ref{fig:tsplot1}, we can see that the imputed 1990 tariff rate is quite in line with the values around it. Notice also that values toward the beginning and end of the time series have slightly higher imputation variance. This occurs because the fit of the polynomials of time in the imputation model have higher variance at the beginning and end of the time series. This is intuitive because these points have fewer neighbors from which to draw predictive power.

A word of caution is in order. As with comparing the histograms of imputed and observed values, there could be reasons that the missing values are systematically different than the observed time series. For instance, if there had been a major financial crisis in Malaysia in 1990 which caused the government to close off trade, then we would expect that the missing tariff rates should be quite different than the observed time series. If we have this information in our imputation model, we might expect to see out-of-line imputations in these time-series plots. If, on the other hand, we did not have this information, we might see "good" time-series plots that fail to point out this violation of the MAR assumption. Our imputation model would produce poor estimates of the missing values since it would be unaware that both the missingness and the true unobserved tariff rate depend on another variable. Hence, tscsPlot() is useful for finding obvious problems in imputation model and comparing the efficiency of various imputation models, but it cannot speak to the untestable assumption of MAR.

Missingness maps {#sec_missmaps}

One useful tool for exploring the missingness in a dataset is a missingness map. This is a map that visualizes the dataset a grid and colors the grid by missingness status. The column of the grid are the variables and the rows are the observations, as in any spreadsheet program. This tool allows for a quick summary of the patterns of missingness in the data.

If we simply call the missmap() function on our output from amelia(),

missmap(a.out)


The missmap() function arrange the columns so that the variables are in decreasing order of missingness from left to right. If the cs argument was set in the amelia function, the labels for the rows will indicate where each of the cross-sections begin.

In this missingness map, it is clear that the tariff rate is the variable most missing in the data and it tends to be missing in blocks of a few observations. Gross international reserves (intresmi) and financial openness (fivop), on the other hand, are missing mostly at the end of each cross-section. This suggests missingness by merging, when variables with different temporal coverages are merged to make one dataset. Sometimes this kind of missingness is an artifact of the date at which the data was merged and researchers can resolve it by finding updated versions of the relevant variables.

The missingness map is an important tool for understanding the patterns of missingness in the data and can often indicate potential ways to improve the imputation model or data collection process.

Try the Amelia package in your browser

Any scripts or data that you put into this service are public.

Amelia documentation built on May 27, 2021, 1:06 a.m.