yourcast: Time-series cross-sectional Forecasting
In IQSS/YourCast: Forecasting Age-Sex-Country-Cause Mortality Rates

Description Usage Arguments Value Author(s) References

Runs a set of regression models to forecast time-series cross-sectional data by either considering independent regressions in each cross-sectional unit or by using a variety of techniques to smooth across units.

yourcast(formula=NULL, dataobj=NULL,sample.frame=c(1950,2000,2001,2030), 
                     standardize=TRUE, elim.collinear=FALSE,
                     tol=0.9999, solve.tol = 1.e-10,svdtol=10^(-10),
                     userfile=NULL, savetmp = T, model.frame=FALSE,
                     debug = F,  rerun= "yourcast.savetmp", 
          ### specific to models
                     model="OLS",zero.mean=FALSE,
          #### smooth over ages           
                     Ha.sigma = 0.3,
                     Ha.sigma.sd= 0.1, Ha.deriv=c(0,0,1),
                     Ha.age.weight=0, Ha.time.weight=0,
          #### smooth over time
                     Ht.sigma= 0.3,
                     Ht.sigma.sd=0.1,  Ht.deriv=c(0,0,1),
                     Ht.age.weight=0, Ht.time.weight=0,
          #### smooth over age-time
                     Hat.sigma=0.2,
                     Hat.sigma.sd=0.1,Hat.a.deriv=c(0,1),Hat.t.deriv=c(0,1), 
                     Hat.age.weight=0,Hat.time.weight=0,
          #### smooth over cntry-time
                     Hct.sigma=0.3, Hct.sigma.sd =0.1,
                     Hct.t.deriv=1, Hct.time.weight = 0,
                     LI.sigma.mean=0.2,LI.sigma.sd = 0.1, nsample= 500,
                     low.pow=T, verbose=TRUE)

`formula`	A standard R formula of the form y \sim x_1 + x_2, except that an explanatory variable is included for a particular cross-section only if it is both listed in the formula and available in that cross-section's data set (see `dataobj`). Explanatory variables in the formula but not available for a cross-section (or in a cross-sectional dataset but not in the formula) are excluded. (For mortality forecasting, the specification looks like log(deaths/population) \sim x_1 + x_2, with deaths and population stored as separate variables in each dataframe.) (May be set to `NULL` if `savetmp` was set to `TRUE` on the last run, in which case the value of formula will come from the saved file.)
`dataobj`	A object of class ‘yourcast’ or equivalent. See `help(yourprep)` for more details. The `dataobj` may be supplied in one of four ways. Most commonly, the argument will specify (1) an object (in working memory) or (2) a string with the name of a file in the working directory. However, if (3) `dataobj` is a string referring to a directory on disk, then each element of the list above should be stored in a file in that directory, with element ‘data’ consisting of a subdirectory containing separate ASCII data files. (If this option is chosen, a complete data object, called ‘dataobj.Rdata’, will be stored in the directory named, and it will be loaded automatically if `yourcast` is run again with this chosen option.) (4) The last option is for dataobj to be set to `NULL`, after which the function will look for a ‘yourcast.savetmp’ file in the working directory from a previous run of the function where the argument `savetmp` was set to `TRUE`. The function `yourprep` is available to help construct the `dataobj` in the proper format from individual cross section files in the working directory or the workspace. This function also performs a number of diagnostics to ensure that the data is entered properly and can be read by `yourcast`. See `help(yourprep)` for more information
`sample.frame`	Vector. A four element vector containing, in order, the start and end time periods to be used for the observed data and the start and end time periods to be forecast. Years identified here that are not available for a cross-section are ignored. Default: `c(1950,2000,2001,2030)`.
`standardize`	Boolean. Should the covariates in each cross-sectional unit be standardized (to zero mean and standard deviation of 1)? Standardization is performed for both the in- and out-of-sample periods. Default: `TRUE`.
`elim.collinear`	Boolean. Whether collinearity among covariates should be tested and those that are collinear shoul be eliminated. Default: `FALSE`.
`tol`	Double scalar. Tolerance to find collinearities among covariates. Default: `0.9999`.
`solve.tol`	A real number smaller than one that is used in the argument of the R-function `solve` to invert matrices (see description for `tol`). Default: 1^{-10}.
`svdtol`	A scalar; the tolerance used in inverting a matrix by SVD. Default: 10^{-10}.
`userfile`	A string with the name of a file that contains your values for some or all of `yourcast`'s arguments. This file contains R code that changes default values of arguments. E.g., the file might contain: index.code <- 30 data <- "WHOmortalityData" If an option is specified in `userfile`, it takes precidence over command line options, so it is normally best to specify each option in either the `userfile` or the command line but not both. Default: `NULL`
`savetmp`	If `TRUE`, `yourcast` saves a file in the default directory (called ‘yourcast.savetmp’) with preliminary calculations. If the value of `formula` or `dataobj` is missing when `yourcast` is called, `yourcast` will get their values from this file, if it exists. This saves a minute or so of computing time for large data sets and is useful for multiple runs on the same data with different formulas specified or different prior values. If `FALSE`, no file is saved. (The structure of ‘yourcast.savetmp’ is for the convenience of `yourcast` and is not intended to be read by the user or saved for more than one run.) Default: `TRUE`.
`model.frame`	If `TRUE`, include entire input dataobj in the output object. Default: `FALSE`.
`debug`	Boolean. It puts the environment that contains parameters and arguments of the simulation in the user workspace. Default `FALSE`.
`rerun`	String. The name of the file that is saved in the default directory with preliminary calculations; see `savetmp`. Default: `yourcast.savetmp`
`model`	A string indicating the forecasting method, including: Bayes maximum a posteriori (`map`), Bayes with Gibbs sampling (`bayes`), Ordinary Least Squares (`ols`), Poisson (`poisson`), and Lee-Carter (`LC`). Default: `ols`. (We usually recommend `map`.) `yourcast` also includes a procedure to help users set the sigma parameters below automatically for the case of model=`map`, and smoothing over age, time, or age and time, but for only one country. You may do this by running a preprocessing instance of `yourcast` first by setting this parameter to `ebayes` and using either the data to be analyzed or a larger data set which is likely to have similar or related parameter values. When `ebayes` is chosen, the `yourcast` output object will contain only the parameter values to feed into the next run of `yourcast`.
`zero.mean`	A boolean or named vector with a value of \barμ for each age group. If `TRUE`, the prior has zero mean. If `FALSE`, the prior has nonzero mean centered around the observed mean age profile (i.e., the average of Y over time and levels of the geographic index for each age group). Default: `FALSE`.
`Ha.sigma`	This can be set in one of three ways: (1) a scalar which sets σ_a, the prior standard deviation of E(Y), indicating how much to smooth E(Y) over age groups (which may vary over geographic areas and time periods, and with the standard deviations averaged over age groups). A larger standard deviation represents more prior uncertainty, which allows the data to play a greater role. (2) `NA` to not smooth in this way. (3) To have `yourcast` search for a good value based on a target value of the derivative of E(Y) with respect to age, set to a vector of elements containing the start and end of a range in sigma in which to look (such as 0.05 and 1.5), the number of values to look at within this range (such as 5), and the target value of the derivative of E(Y) with respect to age (such as 0.05). The vector may also include a fifth element, which is the target value of the total standard deviation of E(Y) over all dimensions of the prior (such as 0.1). (You may choose to run `yourcast` with model=`ebayes` on a related data set to find an approximate target value of the derivative and standard deviation automatically.) Default: `0.30`.
`Ha.sigma.sd`	A scalar; the standard deviation of parameter Ha.sigma (for Gibbs sampling only). Default: `0.1`.
`Ha.deriv`	A numeric vector, each element of which is n,the degree of a (discrete) derivative of the smoothness functional with respect to the age group. Element k of this vector refers to the (k-1)th derivative, where 0 excludes the derviative, 1 includes it, and values in between include the derivative but weight it down proportionally. The first element of the vector corresponds to the weight on the derivative with respect to age of order 0 (the identity operator), the second to the weight on the derivative of order 1 (the 1st derivative), etc. For example, c(0, 1, 1) corresponds to a mixed functional that penalizes the first and second derivatives equally. The higher the order of derivative, the more local smoothness over age groups; and lowest specified derivative controls the form of prior indifference. Default: `c(0, 0, 1)`, which usually works well.
`Ha.age.weight`	A scalar or a numeric vector with weights that determine how much smoothing occurs for different age groups. If set to 0 or NA, age groups are weighted equally; if set to a nonzero scalar, the weight for age group a is set proportional to a^Ha.age.weight; if a vector of length A, the ath element is the weight of age group a. Default: `0`.
`Ha.time.weight`	A scalar or a numeric vector with weights that determine how much smoothing occurs for different time periods when smoothing over age groups. If `0` or `NA`, time periods are weighted equally; if set to a nonzero scalar value, the weight for time period t in smoothing age groups is proportional to t^Ha.time.weight; if the argument is a vector of length T, the tth element is the weight of time period t. Default: `0`.
`Ht.sigma`	This can be set in one of three ways: (1) a scalar which sets σ_t, the prior standard deviation of E(Y), indicating how much to smooth E(Y) over time periods (which may vary over geographic areas and age groups, and with the standard deviations averaged over time periods). A larger standard deviation represents more prior uncertainty, which allows the data to play a greater role. (2) NA to not smooth in this way. (3) To have `yourcast` search for a good value based on a target value of the derivative of E(Y) with respect to time, set to a vector of elements containing the start and end of a range in sigma in which to look (such as 0.05 and 1.5), the number of values to look at within this range (such as 5), and the target value of the derivative of E(Y) with respect to time (such as 0.05). The vector may also include a fifth element, which is the target value of the total standard deviation of E(Y) over all dimensions of the prior (such as 0.1). (You may choose to run `yourcast` with model=`ebayes` on a related data set to find an approximate target value of the derivative and standard deviation automatically.) Default: `0.30`.
`Ht.sigma.sd`	A scalar; the standard deviation of parameter `Ht.sigma` (for Gibbs sampling only). Default: `0.1`.
`Ht.deriv`	A numeric vector, each element of which is n, the degree of a (discrete) derivative of the smoothness functional with respect to time. Element k of this vector refers to the (k-1)th derivative, where 0 excludes the derviative, 1 includes it, and values in between include the derivative but weight it down proportionally. The first element of the vector corresponds to the weight on the derivative with respect to time of order 0 (the identity operator), the second to the weight on the derivative of order 1 (the 1st derivative), etc. For example, `c(0, 1, 1)` corresponds to a mixed functional that penalizes the first and second derivatives equally. The higher the order of derivative, the more local smoothness over time; and lowest specified derivative controls the form of prior indifference. Default: `c(0, 0, 1)`, which usually works well.
`Ht.age.weight`	A scalar or a numeric vector with weights that determine how much smoothing occurs for different age groups when smoothing over time. If set to `0` or `NA`, age groups are weighted equally in smoothing over time; if set to a nonzero scalar, the weight for age group a is set proportional to a^Ht.age.weight; if a vector of length A, the ath element is the weight of age group a. Default: 0.
`Ht.time.weight`	A scalar or a numeric vector with weights that determine how much smoothing occurs for different time periods when smoothing over time. If `0` or `NA`, time periods are weighted equally; if set to a nonzero scalar value, the weight for time period t in smoothing time periods is proportional to t^Ht.time.weight; if the argument is a vector of length T, the tth element is the weight of time period t. Default: 0.
`Hat.sigma`	This can be set in one of three ways: (1) a scalar which sets σ_{at}, the prior standard deviation of E(Y), indicating how much to smooth the time trend in E(Y) over age groups. A larger standard deviation represents more prior uncertainty, which allows the data to play a greater role. (2) NA to not smooth in this way. (3) To have `yourcast` search for a good value based on a target value of the derivative of E(Y) with respect to age and time, set to a vector of elements containing the start and end of a range in sigma in which to look (such as 0.05 and 1.5), the number of values to look at within this range (such as 5), and the target value of the derivative of E(Y) with respect to age and time (such as 0.05). The vector may also include a fifth element, which is the target value of the total standard deviation of E(Y) over all dimensions of the prior (such as 0.1). (You may choose to run `yourcast` with model=`ebayes` on a related data set to find an approximate target value of the derivative and standard deviation automatically.) Default: `0.2`.
`Hat.sigma.sd`	A scalar; the standard deviation of parameter `Hat.sigma` (for Gibbs sampling only). Default: `0.1`.
`Hat.a.deriv`	A numeric vector, each element of which is n, the degree of a (discrete) derivative of the smoothness functional of time trends with respect to age groups. Element k of this vector refers to the (k-1)th derivative of the time trend v with respect to age, where 0 excludes the derviative, 1 includes it, and values in between include the derivative but weight it down proportionally. The first element of the vector corresponds to the weight on the derivative of the time trend with respect to age of order 0 (the identity operator), the second to the weight on the derivative of order 1 (the 1st derivative), etc. For example, `c(0, 1, 1)` corresponds to a mixed functional that penalizes the first and second derivatives equally. The higher the order of derivative, the more local smoothness over time; and lowest specified derivative controls the form of prior indifference. Default: `c(0, 0, 1)`, which usually works well.
`Hat.t.deriv`	A numeric vector, each element of which is n, the degree of a (discrete) derivative of the smoothness functional of age derivative with respect to time. Element k of this vector refers to the (k-1)th derivative of the age derivative with respect to time, where 0 excludes the derviative, 1 includes it, and values in between include the derivative but weight it down proportionally. The first element of the vector corresponds to the weight on the age derivative with respect to time of order 0 (the identity operator), the second to the weight on the derivative of order 1 (the 1st derivative), etc. For example, `c(0, 1, 1)` corresponds to a mixed functional that penalizes the first and second derivatives equally. The higher the order of derivative, the more local smoothness over time; and lowest specified derivative controls the form of prior indifference. Default: `c(0, 0, 1)`, which usually works well.
`Hat.age.weight`	A scalar or a numeric vector with weights that determines how much smoothing occurs for different age groups when smoothing over age and time. If set to `0` or `NA`, age groups are weighted equally in smoothing over time; if set to a nonzero scalar, the weight for age group a is set proportional to a^Ht.age.weight; if a vector of length A, the ath element is the weight of age group a. Default: `0`.
`Hat.time.weight`	A scalar or a numeric vector with weights that determine how much smoothing occurs for different time periods when smoothing over age and time. If `0` or `NA`, time periods are weighted equally; if set to a nonzero scalar value, the weight for time period t in smoothing time periods is proportional to t^Ht.time.weight; if the argument is a vector of length T, the tth element is the weight of time period t. Default: `0`.
`Hct.sigma`	A scalar which sets σ_t, the prior standard deviation of E(Y), which indicates how to smooth E(Y) over geographic areas, or NA to not smooth in this way. The parameter σ_ct is the expected prior standard deviation of E(Y) for a geographic area (varying over time periods and age groups, and with the standard deviations averaged over geographic areas). (A larger standard deviation represents more prior uncertainty, which allows the data to play a greater role.) Default: `0.3`.
`Hct.sigma.sd`	A scalar; the standard deviation of parameter Ht.sigma (for Gibbs sampling only). Default: `0.1`.
`Hct.t.deriv`	A numeric vector; controls whether smoothing the level or the time trend of E(Y) over geographic areas (both cannot presently be done simultaneously). To smooth the level of E(Y) over geographic areas, set to 1, the identity. To smooth the time trend, set this (as in `Hat.t.deriv`) to the weight of the partial derivative taken with respect to time in the standard smoothness functional for the prior. The use of the first or higher order partial derivatives are supported. Default: `1`.
`Hct.time.weight`	A scalar or a numeric vector with weights that determine how much smoothing occurs for different time periods when smoothing over geographic areas. If `0` or `NA`, time periods are weighted equally; if set to a nonzero scalar value, the weight for time period t in smoothing over areas is proportional to t^Hct.time.weight; if the argument is a vector of length T, the tth element is the weight of time period t. Default: `0`.
`LI.sigma.mean`	A scalar; used in the likelihood and in the calculation of the priors in conjunction with `Ha.sigma.sd`, `Hat.sigma.sd`, `Ht.sigma.sd`, and `Hct.sigma.sd`. Default: `0.2`.
`LI.sigma.sd`	A scalar; the standard deviation of `LI.sigma.mean` used in the calculation of the priors. Default: `0.1`.
`nsample`	A scalar; represents the number of iterations in the Gibbs algorithm `bayes`. Default: `500`.
`low.pow`	Boolean. Whether to include lower-power of explanatory variables in the simulation as derived from `formula`. For example y \sim x^4, if `low.pow` = `TRUE`, then x, x^2, x^3, x^4 will be included. Default: `TRUE`.
`verbose`	Boolean. Suppress verbose output. Default: `FALSE`

Returns a list of class ‘yourcast’ containing the following components:

`call`	The full call, including all command line options when yourcast was called.
`userfile`	The full userfile if it was specified.
`yhat`	A list with the same cross-sectional elements as the input data, but with two columns: ‘y’ for the observed dependent variable and ‘yhat’ for the predicted values. These include both in-sample and out-of-sample values, as distinguished by the values of `sample.frame`.
`coeff`	A list with the same cross-sectional elements as the input data, elements of which are the estimated coefficients if calculated by the chosen model.
`sigma`	A list with the same cross-sectional elements as the input data, elements of which are the estimated standard error of the estimate of the regression (the standard deviation of the dependent variable given the explanatory variables).
`aux`	List. A list of summary information about the yourcast analysis used by `plot.yourcast`
`params`	Vector. Smoothing parameters used in model.