yourcast: Time-series cross-sectional Forecasting

Description Usage Arguments Value Author(s) References

Description

Runs a set of regression models to forecast time-series cross-sectional data by either considering independent regressions in each cross-sectional unit or by using a variety of techniques to smooth across units.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
yourcast(formula=NULL, dataobj=NULL,sample.frame=c(1950,2000,2001,2030), 
                     standardize=TRUE, elim.collinear=FALSE,
                     tol=0.9999, solve.tol = 1.e-10,svdtol=10^(-10),
                     userfile=NULL, savetmp = T, model.frame=FALSE,
                     debug = F,  rerun= "yourcast.savetmp", 
          ### specific to models
                     model="OLS",zero.mean=FALSE,
          #### smooth over ages           
                     Ha.sigma = 0.3,
                     Ha.sigma.sd= 0.1, Ha.deriv=c(0,0,1),
                     Ha.age.weight=0, Ha.time.weight=0,
          #### smooth over time
                     Ht.sigma= 0.3,
                     Ht.sigma.sd=0.1,  Ht.deriv=c(0,0,1),
                     Ht.age.weight=0, Ht.time.weight=0,
          #### smooth over age-time
                     Hat.sigma=0.2,
                     Hat.sigma.sd=0.1,Hat.a.deriv=c(0,1),Hat.t.deriv=c(0,1), 
                     Hat.age.weight=0,Hat.time.weight=0,
          #### smooth over cntry-time
                     Hct.sigma=0.3, Hct.sigma.sd =0.1,
                     Hct.t.deriv=1, Hct.time.weight = 0,
                     LI.sigma.mean=0.2,LI.sigma.sd = 0.1, nsample= 500,
                     low.pow=T, verbose=TRUE)

Arguments

formula

A standard R formula of the form y \sim x_1 + x_2, except that an explanatory variable is included for a particular cross-section only if it is both listed in the formula and available in that cross-section's data set (see dataobj). Explanatory variables in the formula but not available for a cross-section (or in a cross-sectional dataset but not in the formula) are excluded. (For mortality forecasting, the specification looks like log(deaths/population) \sim x_1 + x_2, with deaths and population stored as separate variables in each dataframe.) (May be set to NULL if savetmp was set to TRUE on the last run, in which case the value of formula will come from the saved file.)

dataobj

A object of class ‘yourcast’ or equivalent. See help(yourprep) for more details.

The dataobj may be supplied in one of four ways. Most commonly, the argument will specify (1) an object (in working memory) or (2) a string with the name of a file in the working directory. However, if (3) dataobj is a string referring to a directory on disk, then each element of the list above should be stored in a file in that directory, with element ‘data’ consisting of a subdirectory containing separate ASCII data files. (If this option is chosen, a complete data object, called ‘dataobj.Rdata’, will be stored in the directory named, and it will be loaded automatically if yourcast is run again with this chosen option.) (4) The last option is for dataobj to be set to NULL, after which the function will look for a ‘yourcast.savetmp’ file in the working directory from a previous run of the function where the argument savetmp was set to TRUE.

The function yourprep is available to help construct the dataobj in the proper format from individual cross section files in the working directory or the workspace. This function also performs a number of diagnostics to ensure that the data is entered properly and can be read by yourcast. See help(yourprep) for more information

sample.frame

Vector. A four element vector containing, in order, the start and end time periods to be used for the observed data and the start and end time periods to be forecast. Years identified here that are not available for a cross-section are ignored. Default: c(1950,2000,2001,2030).

standardize

Boolean. Should the covariates in each cross-sectional unit be standardized (to zero mean and standard deviation of 1)? Standardization is performed for both the in- and out-of-sample periods. Default: TRUE.

elim.collinear

Boolean. Whether collinearity among covariates should be tested and those that are collinear shoul be eliminated. Default: FALSE.

tol

Double scalar. Tolerance to find collinearities among covariates. Default: 0.9999.

solve.tol

A real number smaller than one that is used in the argument of the R-function solve to invert matrices (see description for tol). Default: 1^{-10}.

svdtol

A scalar; the tolerance used in inverting a matrix by SVD. Default: 10^{-10}.

userfile

A string with the name of a file that contains your values for some or all of yourcast's arguments. This file contains R code that changes default values of arguments. E.g., the file might contain:

    index.code <- 30
    data <- "WHOmortalityData"
  

If an option is specified in userfile, it takes precidence over command line options, so it is normally best to specify each option in either the userfile or the command line but not both. Default: NULL

savetmp

If TRUE, yourcast saves a file in the default directory (called ‘yourcast.savetmp’) with preliminary calculations. If the value of formula or dataobj is missing when yourcast is called, yourcast will get their values from this file, if it exists. This saves a minute or so of computing time for large data sets and is useful for multiple runs on the same data with different formulas specified or different prior values. If FALSE, no file is saved. (The structure of ‘yourcast.savetmp’ is for the convenience of yourcast and is not intended to be read by the user or saved for more than one run.) Default: TRUE.

model.frame

If TRUE, include entire input dataobj in the output object. Default: FALSE.

debug

Boolean. It puts the environment that contains parameters and arguments of the simulation in the user workspace. Default FALSE.

rerun

String. The name of the file that is saved in the default directory with preliminary calculations; see savetmp. Default: yourcast.savetmp

model

A string indicating the forecasting method, including: Bayes maximum a posteriori (map), Bayes with Gibbs sampling (bayes), Ordinary Least Squares (ols), Poisson (poisson), and Lee-Carter (LC). Default: ols. (We usually recommend map.)

yourcast also includes a procedure to help users set the sigma parameters below automatically for the case of model=map, and smoothing over age, time, or age and time, but for only one country. You may do this by running a preprocessing instance of yourcast first by setting this parameter to ebayes and using either the data to be analyzed or a larger data set which is likely to have similar or related parameter values. When ebayes is chosen, the yourcast output object will contain only the parameter values to feed into the next run of yourcast.

zero.mean

A boolean or named vector with a value of \barμ for each age group. If TRUE, the prior has zero mean. If FALSE, the prior has nonzero mean centered around the observed mean age profile (i.e., the average of Y over time and levels of the geographic index for each age group). Default: FALSE.

Ha.sigma

This can be set in one of three ways: (1) a scalar which sets σ_a, the prior standard deviation of E(Y), indicating how much to smooth E(Y) over age groups (which may vary over geographic areas and time periods, and with the standard deviations averaged over age groups). A larger standard deviation represents more prior uncertainty, which allows the data to play a greater role. (2) NA to not smooth in this way. (3) To have yourcast search for a good value based on a target value of the derivative of E(Y) with respect to age, set to a vector of elements containing the start and end of a range in sigma in which to look (such as 0.05 and 1.5), the number of values to look at within this range (such as 5), and the target value of the derivative of E(Y) with respect to age (such as 0.05). The vector may also include a fifth element, which is the target value of the total standard deviation of E(Y) over all dimensions of the prior (such as 0.1). (You may choose to run yourcast with model=ebayes on a related data set to find an approximate target value of the derivative and standard deviation automatically.) Default: 0.30.

Ha.sigma.sd

A scalar; the standard deviation of parameter Ha.sigma (for Gibbs sampling only). Default: 0.1.

Ha.deriv

A numeric vector, each element of which is n,the degree of a (discrete) derivative of the smoothness functional with respect to the age group. Element k of this vector refers to the (k-1)th derivative, where 0 excludes the derviative, 1 includes it, and values in between include the derivative but weight it down proportionally. The first element of the vector corresponds to the weight on the derivative with respect to age of order 0 (the identity operator), the second to the weight on the derivative of order 1 (the 1st derivative), etc. For example, c(0, 1, 1) corresponds to a mixed functional that penalizes the first and second derivatives equally. The higher the order of derivative, the more local smoothness over age groups; and lowest specified derivative controls the form of prior indifference. Default: c(0, 0, 1), which usually works well.

Ha.age.weight

A scalar or a numeric vector with weights that determine how much smoothing occurs for different age groups. If set to 0 or NA, age groups are weighted equally; if set to a nonzero scalar, the weight for age group a is set proportional to a^Ha.age.weight; if a vector of length A, the ath element is the weight of age group a. Default: 0.

Ha.time.weight

A scalar or a numeric vector with weights that determine how much smoothing occurs for different time periods when smoothing over age groups. If 0 or NA, time periods are weighted equally; if set to a nonzero scalar value, the weight for time period t in smoothing age groups is proportional to t^Ha.time.weight; if the argument is a vector of length T, the tth element is the weight of time period t. Default: 0.

Ht.sigma

This can be set in one of three ways: (1) a scalar which sets σ_t, the prior standard deviation of E(Y), indicating how much to smooth E(Y) over time periods (which may vary over geographic areas and age groups, and with the standard deviations averaged over time periods). A larger standard deviation represents more prior uncertainty, which allows the data to play a greater role. (2) NA to not smooth in this way. (3) To have yourcast search for a good value based on a target value of the derivative of E(Y) with respect to time, set to a vector of elements containing the start and end of a range in sigma in which to look (such as 0.05 and 1.5), the number of values to look at within this range (such as 5), and the target value of the derivative of E(Y) with respect to time (such as 0.05). The vector may also include a fifth element, which is the target value of the total standard deviation of E(Y) over all dimensions of the prior (such as 0.1). (You may choose to run yourcast with model=ebayes on a related data set to find an approximate target value of the derivative and standard deviation automatically.) Default: 0.30.

Ht.sigma.sd

A scalar; the standard deviation of parameter Ht.sigma (for Gibbs sampling only). Default: 0.1.

Ht.deriv

A numeric vector, each element of which is n, the degree of a (discrete) derivative of the smoothness functional with respect to time. Element k of this vector refers to the (k-1)th derivative, where 0 excludes the derviative, 1 includes it, and values in between include the derivative but weight it down proportionally. The first element of the vector corresponds to the weight on the derivative with respect to time of order 0 (the identity operator), the second to the weight on the derivative of order 1 (the 1st derivative), etc. For example, c(0, 1, 1) corresponds to a mixed functional that penalizes the first and second derivatives equally. The higher the order of derivative, the more local smoothness over time; and lowest specified derivative controls the form of prior indifference. Default: c(0, 0, 1), which usually works well.

Ht.age.weight

A scalar or a numeric vector with weights that determine how much smoothing occurs for different age groups when smoothing over time. If set to 0 or NA, age groups are weighted equally in smoothing over time; if set to a nonzero scalar, the weight for age group a is set proportional to a^Ht.age.weight; if a vector of length A, the ath element is the weight of age group a. Default: 0.

Ht.time.weight

A scalar or a numeric vector with weights that determine how much smoothing occurs for different time periods when smoothing over time. If 0 or NA, time periods are weighted equally; if set to a nonzero scalar value, the weight for time period t in smoothing time periods is proportional to t^Ht.time.weight; if the argument is a vector of length T, the tth element is the weight of time period t. Default: 0.

Hat.sigma

This can be set in one of three ways: (1) a scalar which sets σ_{at}, the prior standard deviation of E(Y), indicating how much to smooth the time trend in E(Y) over age groups. A larger standard deviation represents more prior uncertainty, which allows the data to play a greater role. (2) NA to not smooth in this way. (3) To have yourcast search for a good value based on a target value of the derivative of E(Y) with respect to age and time, set to a vector of elements containing the start and end of a range in sigma in which to look (such as 0.05 and 1.5), the number of values to look at within this range (such as 5), and the target value of the derivative of E(Y) with respect to age and time (such as 0.05). The vector may also include a fifth element, which is the target value of the total standard deviation of E(Y) over all dimensions of the prior (such as 0.1). (You may choose to run yourcast with model=ebayes on a related data set to find an approximate target value of the derivative and standard deviation automatically.) Default: 0.2.

Hat.sigma.sd

A scalar; the standard deviation of parameter Hat.sigma (for Gibbs sampling only). Default: 0.1.

Hat.a.deriv

A numeric vector, each element of which is n, the degree of a (discrete) derivative of the smoothness functional of time trends with respect to age groups. Element k of this vector refers to the (k-1)th derivative of the time trend v with respect to age, where 0 excludes the derviative, 1 includes it, and values in between include the derivative but weight it down proportionally. The first element of the vector corresponds to the weight on the derivative of the time trend with respect to age of order 0 (the identity operator), the second to the weight on the derivative of order 1 (the 1st derivative), etc. For example, c(0, 1, 1) corresponds to a mixed functional that penalizes the first and second derivatives equally. The higher the order of derivative, the more local smoothness over time; and lowest specified derivative controls the form of prior indifference. Default: c(0, 0, 1), which usually works well.

Hat.t.deriv

A numeric vector, each element of which is n, the degree of a (discrete) derivative of the smoothness functional of age derivative with respect to time. Element k of this vector refers to the (k-1)th derivative of the age derivative with respect to time, where 0 excludes the derviative, 1 includes it, and values in between include the derivative but weight it down proportionally. The first element of the vector corresponds to the weight on the age derivative with respect to time of order 0 (the identity operator), the second to the weight on the derivative of order 1 (the 1st derivative), etc. For example, c(0, 1, 1) corresponds to a mixed functional that penalizes the first and second derivatives equally. The higher the order of derivative, the more local smoothness over time; and lowest specified derivative controls the form of prior indifference. Default: c(0, 0, 1), which usually works well.

Hat.age.weight

A scalar or a numeric vector with weights that determines how much smoothing occurs for different age groups when smoothing over age and time. If set to 0 or NA, age groups are weighted equally in smoothing over time; if set to a nonzero scalar, the weight for age group a is set proportional to a^Ht.age.weight; if a vector of length A, the ath element is the weight of age group a. Default: 0.

Hat.time.weight

A scalar or a numeric vector with weights that determine how much smoothing occurs for different time periods when smoothing over age and time. If 0 or NA, time periods are weighted equally; if set to a nonzero scalar value, the weight for time period t in smoothing time periods is proportional to t^Ht.time.weight; if the argument is a vector of length T, the tth element is the weight of time period t. Default: 0.

Hct.sigma

A scalar which sets σ_t, the prior standard deviation of E(Y), which indicates how to smooth E(Y) over geographic areas, or NA to not smooth in this way. The parameter σ_ct is the expected prior standard deviation of E(Y) for a geographic area (varying over time periods and age groups, and with the standard deviations averaged over geographic areas). (A larger standard deviation represents more prior uncertainty, which allows the data to play a greater role.) Default: 0.3.

Hct.sigma.sd

A scalar; the standard deviation of parameter Ht.sigma (for Gibbs sampling only). Default: 0.1.

Hct.t.deriv

A numeric vector; controls whether smoothing the level or the time trend of E(Y) over geographic areas (both cannot presently be done simultaneously). To smooth the level of E(Y) over geographic areas, set to 1, the identity. To smooth the time trend, set this (as in Hat.t.deriv) to the weight of the partial derivative taken with respect to time in the standard smoothness functional for the prior. The use of the first or higher order partial derivatives are supported. Default: 1.

Hct.time.weight

A scalar or a numeric vector with weights that determine how much smoothing occurs for different time periods when smoothing over geographic areas. If 0 or NA, time periods are weighted equally; if set to a nonzero scalar value, the weight for time period t in smoothing over areas is proportional to t^Hct.time.weight; if the argument is a vector of length T, the tth element is the weight of time period t. Default: 0.

LI.sigma.mean

A scalar; used in the likelihood and in the calculation of the priors in conjunction with Ha.sigma.sd, Hat.sigma.sd, Ht.sigma.sd, and Hct.sigma.sd. Default: 0.2.

LI.sigma.sd

A scalar; the standard deviation of LI.sigma.mean used in the calculation of the priors. Default: 0.1.

nsample

A scalar; represents the number of iterations in the Gibbs algorithm bayes. Default: 500.

low.pow

Boolean. Whether to include lower-power of explanatory variables in the simulation as derived from formula. For example y \sim x^4, if low.pow = TRUE, then x, x^2, x^3, x^4 will be included. Default: TRUE.

verbose

Boolean. Suppress verbose output. Default: FALSE

Value

Returns a list of class ‘yourcast’ containing the following components:

call

The full call, including all command line options when yourcast was called.

userfile

The full userfile if it was specified.

yhat

A list with the same cross-sectional elements as the input data, but with two columns: ‘y’ for the observed dependent variable and ‘yhat’ for the predicted values. These include both in-sample and out-of-sample values, as distinguished by the values of sample.frame.

coeff

A list with the same cross-sectional elements as the input data, elements of which are the estimated coefficients if calculated by the chosen model.

sigma

A list with the same cross-sectional elements as the input data, elements of which are the estimated standard error of the estimate of the regression (the standard deviation of the dependent variable given the explanatory variables).

aux

List. A list of summary information about the yourcast analysis used by plot.yourcast

params

Vector. Smoothing parameters used in model.

Author(s)

Federico Girosi girosi@rand.org; Elena Villalon evillalon@iq.harvard.edu; Gary King king@harvard.edu

References

http://gking.harvard.edu/yourcast


IQSS/YourCast documentation built on May 7, 2019, 6:03 a.m.