R/microsynth.R

Defines functions check.matchout remove.vars msCluster make.ci2 make.ci out.results newreshape get.pval microsynth

Documented in microsynth

#' @title
#' Synthetic control methods for disaggregated, micro-level data.
#'
#' @description
#' Implements the synthetic control method for micro-level data as outlined in
#' Robbins, Saunders, and Kilmer (2017).  \code{microsynth} is designed for use
#' in assessment of the effect of an intervention using longitudinal data.
#' However, it may also be used to calculate propensity score-type weights in
#' cross-sectional data. \code{microsynth} is a generalization
#' of \code{Synth} (see Abadie and Gardeazabal (2003) and Abadie, Diamond,
#' Hainmueller (2010, 2011, 2014)) that is designed for data at a more granular
#' level (e.g., micro-level). For more details see the help vignette:
#' \code{vignette('microsynth', package = 'microsynth')}.
#'
#' \code{microsynth} develops a synthetic control group by searching for weights
#' that exactly match a treatment group to a synthetic control group across
#' a number of variables while also minimizing the discrepancy between the
#' synthetic control group and the treatment group across a set second set of
#' variables.  \code{microsynth} works in two primary steps: 1) calculation of
#' weights and 2) calculation of results.  Time series plots of treatment
#' vs. synthetic control for pertinent outcomes may be performed using the
#' function \code{plot.microsynth()}.
#'
#' The time range over which data are observed is segmented into pre- and
#' post-intervention periods.  Treatment is matched to synthetic control
#' across the pre-intervention period, and the effect of the intervention
#' is assessed across the post-intervention (or evaluation) period.  The input
#' \code{end.pre} (which gives the last pre-intervention time period) is used to
#' delineate between pre- and post-intervention.  Note that if the intervention
#' is not believed to have an instantaneous effect, \code{end.pre} should indicate
#' the time of the intervention.
#'
#' Variables are categorized as outcomes (which are time-variant) and covariates
#' (which are time-invariant).  Using the respective inputs \code{match.covar}
#' and \code{match.out}, the user specifies across which covariates and outcomes
#' (and which pre-intervention time points of the outcomes) treatment is to be
#' exactly matched to synthetic control.  The inputs \code{match.covar.min} and
#' \code{match.out.min} are similar but instead specify variables across which
#' treatment is to be matched to synthetic control as closely as possible.  If
#' there are no variables specified in \code{match.covar.min} and
#' \code{match.out.min}, the function \code{calibrate()} from the \code{survey}
#' package is used to calculate weights.  Otherwise, the function
#' \code{LowRankQP()} from the package of the same name is used, if it is
#' available on the user's machine (it is now in the CRAN archive, so would need
#' to be installed by other means). If the \code{LowRankQP} package is unavailable,
#' it will use \code{ipop()} from the \code{kernlab} package.  In the event
#' that the model specified by \code{match.covar} and \code{match.out} is not
#' feasible (i.e., weights do not exist that exactly match treatment and
#' synthetic control subject to the given constraints), a less restrictive
#' backup model is used.
#'
#' \code{microsynth} has the capability to perform
#' statistical inference using Taylor series linearization, a jackknife and
#' permutation methods.  Several sets of weights are calculated.  A set of main
#' weights is calculated that is used to determine a point estimate of the
#' intervention effect.  The main weights can also be used to perform inferences
#' on the point estimator via Taylor series linearization.  If a jackknife is to
#' be used, one set of weights is calculated for each jackknife replication
#' group, and if permutation methods are to be used, one set of weights is
#' calculated for each permutation group.  If treatment and synthetic control
#' are not easily matched based upon the model outlined in \code{match.covar}
#' and \code{match.out} (i.e., an exact solution is infeasible or nearly
#' infeasible), it is recommended that the jackknife not be used for inference.
#'
#' The software provides the user the option to output overall findings in an Excel
#' file.  For each outcome variable, the results list the estimated treatment
#' effect, as well as confidence intervals of the effect and p-values of a
#' hypothesis test that assesses whether the effect is zero.   Such results are
#' produced as needed for each of the three methods of statistical inference
#' noted above.  \code{microsynth} can also apply an omnibus test that examines
#' the presence of a treatment effect jointly across several outcomes.
#'
#' @details
#' \code{microsynth} requires specification of the following inputs:
#' \code{data}, \code{idvar}, \code{intvar}.  \code{data} is a longitudinal data
#' frame; \code{idvar} and \code{intvar} are character strings that specific
#' pertinent columns of \code{data}.  In longitudinal data, \code{timevar}
#' should be specified.  Furthermore, specification of \code{match.out} and
#' \code{match.covar} is recommended.
#'
#' \code{microsynth} can also be used to calculate propensity score-type weights
#' in cross sectional data (in which case \code{timevar} does not need to be
#' specified) as proposed by Hainmueller (2012).
#'
#' @param data A data frame.  If longitudinal, the data must be entered in tall
#'   format (e.g., at the case/time-level with one row for each time period for
#'   each case).  Missingness is not allowed.  All individuals must have non-NA
#'   values of all variables at all time points.
#'
#' @param idvar A character string that gives the variable in \code{data} that
#'   identifies multiple records from the same case.
#'
#' @param intvar A character string that gives the variable in \code{data} that
#'   corresponds to the intervention variable.  The intervention variable
#'   indicates which cases (and times) have received the intervention.  The
#'   variable should be binary, with a 1 indicating treated and 0 indicating
#'   untreated.  If \code{end.pre}
#'   is specified, a case is considered treated if there is 1 or more non-zero
#'   entries in the column indicated by \code{intvar} for that case (at any time
#'   point).  If \code{end.pre} is not specified, an attempt will be made to
#'   use \code{intvar} to determine which time periods will be considered
#'   post-intervention (i.e., the times contained in the evaluation period).
#'   In this case, the evaluation period is considered to begin at the time of
#'   the first non-zero entry in \code{intvar}).
#'
#' @param timevar A character string that gives the variable in
#'   \code{data} that differentiates multiple records from the same case.  Can be
#'   set to \code{NULL} only when used with cross-sectional data (i.e., with one
#'   observation per entry in \code{idvar}).
#'
#' @param w A \code{microsynth} object or a list of the form as returned
#'   by a prior application of \code{microsynth}.
#'   If \code{w = NULL}, weights are calculated from scratch.
#'   Entering a \code{non-NULL} value affords the user the ability to
#'   use previously calculated  weights.
#'
#' @param end.pre An integer that gives the final time point of the
#'   pre-intervention period.  That is, \code{end.pre} is the last time at
#'   which treatment and synthetic control will be matched to one another.
#'   All time points
#'   following \code{end.pre} are considered to be post-intervention and the
#'   behavior of outcomes will be compared between the treatment and synthetic
#'   control groups across those time periods.
#'   Setting \code{end.pre = NULL} will begin the post-intervention period
#'   at the time
#'   that corresponds to the first non-zero entry in the column indicated by
#'   \code{intvar}.
#'
#' @param end.post An integer that gives the maximum post-intervention time that
#'   is taken into when compiling results.  That is, the treatment and synthetic
#'   control groups are compared across the outcomes listed in \code{result.var}
#'   from the first time following the intervention up to \code{end.post}.  Can
#'   be a vector (ordered, increasing) giving multiple values of
#'   \code{end.post}.  In this case, the results will be compiled for each entry
#'   in \code{end.post}.  When \code{end.post = NULL} (the default), it is reset
#'   to the maximum time that appears in the column given by \code{timevar}.
#'
#' @param match.out Either A) logical, B) a vector of variable names that
#'   indicates across which time-varying variables treatment is to be exactly matched
#'   to synthetic control pre-intervention, or C) a
#'   list consisting of variable names and timespans over which variables should
#'   be aggregated before matching.  Note that outcome variables and time-varying
#'   covariates should be included in \code{match.out}.
#'
#'   If \code{match.out = TRUE} (the default), it is set equal to
#'   \code{result.var}; if \code{match.out = NULL} or \code{match.out = FALSE},
#'   no outcome variables are factored into the calculation of weights. If
#'   \code{match.out} is passed a vector of variable
#'   names, then weights are calculated to match treatment and synthetic control
#'   for the value of each variable that appears in \code{match.out} at each
#'   time point from \code{start.pre} to \code{end.pre}. Otherwise, to allow
#'   more flexibility, \code{match.out} may also be a list that gives
#'   an outcome-based model outlining more specific constraints that are to be
#'   exactly satisfied within calibration weighting.  In this case, each entry
#'   of \code{match.out} is a vector of integers, and the names of entries of
#'   \code{match.out} are the outcome variables to which the vectors correspond.
#'   Each element of the vectors gives a number of time points that are to be
#'   aggregated for the respective outcome, with the first element indicating
#'   time points immediately prior the beginning of the post-intervention
#'   period.  The sum of
#'   the elements in each vector should not exceed the number of
#'   pre-intervention time periods in the data.
#'
#'   The following examples show the proper formatting of \code{match.out} as a
#'   list.  Assume that there are two outcomes, Y1 and Y2 (across which
#'   treatment is to be matched to synthetic control), and \code{end.pre = 10}
#'   (i.e., the post-intervention period begins at time 11).
#'   Let \code{match.out = list('Y1' = c(1, 3, 3), 'Y2'=
#'   c(2,5,1))}.  According to this specification, treatment is to be matched to
#'   synthetic control across: a) The value of Y1 at time 10; b) the sum of Y1
#'   across times 7, 8 and 9; c) the sum of Y1 across times 4, 5 and 6; e) The
#'   sum of Y2 across times time 9 and 10; e) the sum of Y2 across times 4, 5,
#'   6, 7, and 8; f) the value of Y2 at time 3.  Likewise, if \code{match.out =
#'   list('Y1' = 10, 'Y2'= rep(1,10))}, Y1 is matched to synthetic control
#'   the entire aggregated pre-intervention time range, and Y2 is matched
#'   at each pre-intervention time point individually.
#'
#' @param match.out.min A vector or list of the same format as \code{match.out}
#'   that is used to specify additional time-varying variables to match
#'   on, but which need not be matched exactly. Weights are calculated so the
#'   distance is minimized between treatment and synthetic control across these
#'   variables. If \code{match.out.min = NULL}, no outcome-based constraints
#'   beyond those indicated by \code{match.out} are imposed (i.e., all outcome
#'   variables will be matched on exactly).
#'
#' @param match.covar Either a logical or a vector of variable names that
#'   indicates which time invariant covariates
#'   are to be used for weighting.  Weights are
#'   calculated so that treatment and synthetic control exactly match across
#'   these variables.  If \code{match.covar = TRUE}, it is set equal to a vector
#'   of variable names corresponding to the time invariant variables that
#'   appear in \code{data}.  If \code{match.covar = FALSE}, it is set to
#'   \code{NULL} (in which case no time-invariant variables are used for matching
#'   when calculating weights).
#'
#' @param match.covar.min A vector of variable names that indicates supplemental
#'   time invariant variables that are to be used for weighting, for which exact
#'   matches are not required. Weights are calculated so the distance is
#'   minimized between treatment and synthetic control across these variables.
#'
#' @param perm An integer giving the number of permutation groups that are used.
#'   If \code{perm = 0}, no permutation groups are generated, permutation
#'   weights are not calculated, and permutations do not factor into the
#'   reported results.  \code{perm} is set to the number of possible permutation
#'   groups if the former exceeds the latter.
#'
#' @param jack An integer giving the number of replication groups that are used
#'   for the jackknife.  \code{jack} can also be a logical indicator.  If
#'   \code{jack = 0} or \code{jack = FALSE}, no jackknife replication groups are
#'   generated, jackknife weights are not calculated, and the jackknife is not
#'   considered when reporting results.  If \code{jack = TRUE}, it is reset to
#'   being equal to the minimum between the number of total cases in the
#'   treatment group and the total number of cases in the control group.
#'   \code{jack} is also reset to that minimum if it, as entered, exceeds that
#'   minimum.
#'
#' @param check.feas A logical indicator of whether or not the feasibility of
#'   the model specified by \code{match.out} is evaluated prior to calculation
#'   of weights.  If \code{check.feas = TRUE}, feasibility is assessed. If
#'   \code{match.out} is found to not specify a feasible model, a less
#'   restrictive feasible backup model will be applied to calculate the main
#'   weights and for jackknife and permutation methods.
#'
#' @param use.backup A logical variable that, when true, indicates whether a
#'   backup model should be used whenever the model specified by
#'   \code{match.out} yields unsatisfactory weights.  Weights are deemed to be
#'   unsatisfactory if they do not sufficiently satisfy the constraints imposed
#'   by \code{match.out} and \code{match.covar}.  Different backup models may be
#'   used for each of the main, jackknife or permutation weights as needed.
#'
#' @param max.mse The maximum error (given as mean-squared error) permissible
#'   for constraints that are to be exactly satisfied.  If \code{max.mse} is not
#'   satisfied by these constraints, and either \code{check.feas = TRUE} or
#'   \code{use.backup = TRUE}, then back-up models are used.
#'
#' @param result.var A vector of variable names giving the outcome
#'   variables for which results will be reported.  Time-varying covariates
#'   should be excluded from \code{result.var}.  If \code{result.var = TRUE}
#'   (the default), \code{result.var} is set as being equal to all time-varying
#'   variables that appear in \code{data}.  If \code{result.var = NULL} or
#'   \code{result.var = FALSE}, results are not tabulated.
#'
#' @param omnibus.var A vector of variable names that indicates the outcome
#'   variables that are to be used within the calculation of the omnibus
#'   statistic.  Can also be a logical indicator.  When \code{omnibus.var =
#'   TRUE}, it is reset as being equal to \code{result.var}.  When
#'   \code{omnibus.var = NULL} or \code{omnibus = FALSE}, no omnibus statistic
#'   is calculated.  \code{omnibus.var} should not contain elements not in
#'   \code{result.var}.
#'
#' @param period An integer that gives the granularity of the data that will be
#'   used for plotting and compiling results.  If \code{match.out} and
#'   \code{match.out.min} are provided a vector of variable names, it will also
#'   affect the calculation of weights used for matching. In this case, matching
#'   of treatment and synthetic control is performed at a temporal granularity
#'   defined by \code{period}. For instance, if monthly data are provided and
#'   \code{period = 3}, data are aggregated to quarters for plots and results
#'   (and weighting unless otherwise specified). If \code{match.out} and
#'   \code{match.out.min} are provided a list, \code{period} only affects plots
#'   and how results are displayed.
#'
#'   Note that plotting is performed with
#'   \code{plot.microsynth()}; however, a \code{microsynth} object is required as
#'   input for that function and \code{period} should be specified in the creation
#'   of that object.
#'
#' @param cut.mse The maximum error (given as mean-squared error) permissible
#'   for permutation groups.  Permutation groups with a larger than permissible
#'   error are dropped when calculating results.  The mean-squared error is only
#'   calculated over constraints that are to be exactly satisfied.
#'
#' @param test The type of hypothesis test (one-sided lower, one-sided upper, or
#'   two-sided) that is used when calculating p-values.  Entries of
#'   \code{'lower'}, \code{'upper'}, and \code{'twosided'} are recognized.
#'
#' @param result.file A character string giving the name of a file that will be
#'   created in the home directory containing results.  If \code{result.file =
#'   NULL} (the default), no file is created.  If \code{end.post} has length 1,
#'   a \code{.csv} file is created.  If \code{end.post} has length greater than
#'   one, a formatted \code{.xlsx} file is created with one tab for each element
#'   of \code{end.post}.  If \code{result.file} has a \code{.xlsx} (or
#'   \code{.xls}) extension (e.g., the last five characters of result.file are
#'   '.xlsx'), an \code{.xlsx} file is created regardless of the length of
#'   \code{end.post}.
#'
#' @param use.survey If \code{use.survey = TRUE}, Taylor series linearization is
#'   applied to the estimated treatment effect within each permutation group.
#'   Setting \code{use.survey = TRUE} makes for better inference but increases
#'   computation time substantially.  Confidence intervals for permutation
#'   groups are calculated only when \code{use.survey = TRUE}.
#'
#' @param confidence The level of confidence for confidence intervals.
#'
#' @param start.pre An integer indicating the time point that corresponds to the
#'   beginning of the pre-intervention period used for
#'   matching.  When \code{start.pre = NULL} (default), it is reset to the
#'   minimum time appearing in the column given by \code{timevar}.  If
#'   \code{match.out} (and \code{match.out.min}) are given in list format,
#'   \code{start.pre} is ignored except for plotting.
#'
#' @param scale.var  A variable name.  When comparing the treatment group to all
#'   cases, the latter is scaled to the size of the former with respect to the
#'   variable indicated by \code{scale.var}.  Defaults to the number of units
#'   receiving treatment (i.e., the intercept).
#'
#' @param maxit The maximum number of iterations used within the calibration
#'   routine (\code{calibrate()} from the \code{survey} package) for
#'   calculating weights.
#'
#' @param cal.epsilon The tolerance used within the calibration routine
#'   (\code{calibrate()} from the \code{survey} package) for calculating
#'   weights.
#'
#' @param calfun The calibration function used within the calibration routine
#'   (\code{calibrate()} from the \code{survey} package) for calculating
#'   weights.
#'
#' @param bounds Bounds for calibration weighting (fed into the
#'   \code{calibrate()} from the \code{survey} package).
#'
#' @param printFlag If TRUE, \code{microsynth} will print history on console. Use
#'   \code{printFlag = FALSE} for silent computation.
#'
#' @param n.cores The number of CPU cores to use for parallelization. If
#'   \code{n.cores} is not specified by the user, it is guessed using the
#'   \code{detectCores} function in the parallel package.  If \code{TRUE}
#'   (the default), it is set as \code{detectCores()}.  If \code{NULL}, it is set as
#'   \code{detectCores() - 1}.  If \code{FALSE}, it is set as \code{1}, in which case
#'   parallelization is not invoked.  Note that the
#'   documentation for \code{detectCores} makes clear that it is not failsafe and
#'   could return a spurious number of available cores.
#'
#' @param ret.stats if set to \code{TRUE}, returns four additional elements:
#'   \code{stats}, \code{stats1}, \code{stats2} and \code{delta.out}.
#'
#' @details \code{microsynth} calculates weights using
#'   \code{survey::calibrate()} from the \code{survey} package in circumstances
#'   where a feasible solution exists for all constraints, whereas
#'   \code{LowRankQP::LowRankQP()} is used to assess feasibility and to
#'   calculate weights in the event that a feasible solution to all constraints
#'   does not exist.  The \code{LowRankQP} routine is memory-intensive and can
#'   run quite slowly in data that have a large number of cases.  To prevent
#'   \code{LowRankQP} from being used, set \code{match.out.min = NULL},
#'   \code{match.covar.min= NULL}, \code{check.feas = FALSE}, and
#'   \code{use.backup = FALSE}.
#'
#' @return \code{microsynth} returns a list with up to five elements: a)
#'   \code{w}, b) \code{Results}, c) \code{svyglm.stats}, and
#'   d) \code{Plot.Stats}, and e) \code{info}.
#'
#'   \code{w} is a list with six elements: a) \code{Weights}, b) \code{Intervention},
#'   c) \code{MSE}, d) \code{Model}, e) \code{Summary}, and f) \code{keep.groups}.
#'   Assume there are
#'   C total sets of weights calculated, where C = \code{1 + jack + perm}, and
#'   there are N total cases across the treatment and control groups.
#'   \code{w$Weights} is an N x C matrix, where each column provides a set of
#'   weights.  \code{w$Intervention} is an N x C matrix made of logical
#'   indicators that indicate whether or not the case in the respective row is
#'   considered treated (at any point in time) for the respective column.
#'   Entries of \code{NA} are to be dropped for the respective jackknife
#'   replication group (\code{NA}s only appear in jackknife weights).
#'   \code{w$MSE} is a 6 x C matrix that give the MSEs for each set of weights.
#'   MSEs are listed for the primary and secondary constraints for the first,
#'   second, and third models.  Note that the primary constraints differ for each
#'   model (see Robbins and Davenport, 2021). \code{w$Model} is a length-C vector that
#'   indicates whether backup models were used in the calculation of each set of
#'   weights.  \code{w$keep.groups} is a logical vector indicating which groups
#'   are to be used in analysis (groups that are not used have pre-intervention
#'   MSE greater than \code{cut.mse}.  \code{w$Summary} is a three-column matrix
#'   that (for treatment,
#'   synthetic control, and the full dataset), shows aggregate values
#'   of the variables across which treatment and synthetic control are matched.
#'   The summary, which is tabulated only for the primary weights, is also
#'   printed by \code{microsynth} while weights are being calculated.
#'
#'   Further, \code{Results} is a list where each element gives the final
#'   results for each value of \code{end.post}.  Each element of \code{Results}
#'   is itself a matrix with each row corresponding to an outcome variable (and
#'   a row for the omnibus test, if used) and each column denotes estimates of
#'   the intervention effects and p-values, upper, and lower bounds of
#'   confidence intervals as found using Taylor series linearization (Linear),
#'   jackknife (jack), and permutation (perm) methods where needed.
#'
#'   In addition, \code{svyglm.stats} is a list where each element is a
#'   matrix that includes the output from the regression models run using the
#'   \code{svyglm()} function to estimate the treatment effect.  The list has one
#'   element for each value of \code{end.post}, and the matrices each have
#'   one row per variable in \code{result.var}.
#'
#'   Next, \code{Plot.Stats} contains the data that are displayed in the
#'   plots which may be generated using \code{plot.microsynth()}.
#'   \code{Plot.Stats} is a list with four elements (Treatment, Control,
#'   All, Difference).  The first three elements are matrices with one row per
#'   outcome variable and one column per time point.  The last element (which
#'   gives the treatment minus control values) is an array that contains data
#'   for each permutation group in addition to the true treatment area.
#'   Specifically, \code{Plot.Stats$Difference[,,1]} contains the time series of
#'   treatment minus control for the true intervention group;
#'   \code{Plot.Stats$Difference[,,i+1]} contains the time series of treatment
#'   minus control for the i^th permutation group.
#'
#'   Next, \code{info} documents some input parameters for display by
#'   \code{print()}. A summary of weighted matching variables and of results
#'   can be viewed using \code{\link{summary}}
#'
#'   Lastly, if \code{ret.stats} is set to \code{TRUE}, four additional elements
#'   are returned: \code{stats}, \code{stats1}, \code{stats2} and \code{delta.out}.
#'   \code{stats} contains elements with the basic statistics that are the same as
#'   the main microsynth output: outcomes in treatment, control and percentage change.
#'   \code{stats1} are the estimates of \code{svyglm()} adjusted by their standard
#'   errors. \code{stats2} is the percent change in the observed value from each
#'   outcome from the hypothetical outcome absent intervention. \code{delta.out} is
#'   a Taylor series linearization used to approximate the variance of the estimator.
#'
#' @references Abadie A, Diamond A, Hainmueller J (2010). Synthetic control
#'   methods for comparative case studies: Estimating the effect of California's
#'   tobacco control program.? \emph{Journal of the American Statistical
#'   Association}, 105(490), 493-505.
#'
#'   Abadie A, Diamond A, Hainmueller J (2011). Synth: An R Package for
#'   Synthetic Control Methods in Comparative Case Studies.? \emph{Journal
#'   of Statistical Software}, 42(13), 1-17.
#'
#'   Abadie A, Diamond A, Hainmueller J (2015). Comparative politics and the
#'   synthetic control method. \emph{American Journal of Political Science},
#'   59(2), 495-510.
#'
#'   Abadie A, Gardeazabal J (2003). The economic costs of conflict: A case
#'   study of the Basque Country.? \emph{American Economic Review}, pp. 113-132.
#'
#'   Hainmueller, J. (2012), Entropy Balancing for Causal Effects: A
#'   Multivariate Reweighting Method to Produce Balanced Samples in
#'   Observational Studies,? \emph{Political Analysis}, 20, 25-46.
#'
#'   Robbins MW, Saunders J, Kilmer B (2017). A framework for synthetic control
#'   methods with high-dimensional, micro-level data: Evaluating a neighborhood-
#'   specific crime intervention,? \emph{Journal of the American Statistical
#'   Association}, 112(517), 109-126.
#'
#'   Robbins MW, Davenport S (2021). microsynth: Synthetic Control Methods for
#'   Disaggregated and Micro-Level Data in R,? \emph{Journal of Statistical
#'   Software}, 97(2), doi:10.18637/jss.v097.i02.
#'
#' @examples
#'
#' # Use seattledmi, block-level panel data, to evaluate a crime intervention.
#'
#' # Declare time-variant (outcome) and time-invariant variables for matching
#' cov.var <- c('TotalPop', 'BLACK', 'HISPANIC', 'Males_1521',
#'        'HOUSEHOLDS', 'FAMILYHOUS', 'FEMALE_HOU', 'RENTER_HOU', 'VACANT_HOU')
#'
#' match.out <- c('i_felony', 'i_misdemea', 'i_drugs', 'any_crime')
#' set.seed(99199) # for reproducibility
#'
#'
#'
#' # Perform matching and estimation, without permutations or jackknife
#' # runtime: < 1 min
#'
#' \donttest{
#' sea1 <- microsynth(seattledmi,
#'                   idvar='ID', timevar='time', intvar='Intervention',
#'                   start.pre=1, end.pre=12, end.post=16,
#'                   match.out=match.out, match.covar=cov.var,
#'                   result.var=match.out, omnibus.var=match.out,
#'                   test='lower',
#'                   n.cores = min(parallel::detectCores(), 2))
#'
#' # View results
#' summary(sea1)
#' plot_microsynth(sea1)
#' }
#'
#' \dontrun{
#' # Repeat matching and estimation, with permutations and jackknife
#' # Set permutations and jack-knife to very few groups (2) for
#' # quick demonstration only.
#' # runtime: ~30 min
#' sea2 <- microsynth(seattledmi,
#'                      idvar='ID', timevar='time', intvar='Intervention',
#'                      start.pre=1, end.pre=12, end.post=c(14, 16),
#'                      match.out=match.out, match.covar=cov.var,
#'                      result.var=match.out, omnibus.var=match.out,
#'                      test='lower',
#'                      perm=250, jack=TRUE,
#'                      result.file=file.path(tempdir(), 'ExResults2.xlsx'),
#'                      n.cores = min(parallel::detectCores(), 2))
#'
#' # View results
#' summary(sea2)
#' plot_microsynth(sea2)
#'
#' # Specify additional outcome variables for matching, which makes
#' # matching harder.
#' match.out <- c('i_robbery','i_aggassau','i_burglary','i_larceny',
#'        'i_felony','i_misdemea','i_drugsale','i_drugposs','any_crime')
#'
#' # Perform matching, setting check.feas = T and use.backup = T
#' # to ensure model feasibility
#' # runtime: ~40 minutes
#' sea3 <- microsynth(seattledmi,
#'                    idvar='ID', timevar='time', intvar='Intervention',
#'                    end.pre=12,
#'                    match.out=match.out, match.covar=cov.var,
#'                    result.var=match.out, perm=250, jack=0,
#'                    test='lower', check.feas=TRUE, use.backup = TRUE,
#'                    result.file=file.path(tempdir(), 'ExResults3.xlsx'),
#'                    n.cores = min(parallel::detectCores(), 2))
#'
#'
#' # Aggregate outcome variables before matching, to boost model feasibility
#' match.out <- list( 'i_robbery'=rep(2, 6), 'i_aggassau'=rep(2, 6),
#'          'i_burglary'=rep(1, 12), 'i_larceny'=rep(1, 12),
#'          'i_felony'=rep(2, 6), 'i_misdemea'=rep(2, 6),
#'          'i_drugsale'=rep(4, 3), 'i_drugposs'=rep(4, 3),
#'          'any_crime'=rep(1, 12))
#'
#' # After aggregation, use.backup and cheack.feas no longer needed
#' # runtime: ~40 minutes
#' sea4 <- microsynth(seattledmi, idvar='ID', timevar='time',
#'          intvar='Intervention', match.out=match.out, match.covar=cov.var,
#'          start.pre=1, end.pre=12, end.post=16,
#'          result.var=names(match.out), omnibus.var=names(match.out),
#'          perm=250, jack = TRUE, test='lower',
#'          result.file=file.path(tempdir(), 'ExResults4.xlsx'),
#'          n.cores = min(parallel::detectCores(), 2))
#'
#' # View results
#' summary(sea4)
#' plot_microsynth(sea4)
#'
#'
#' # Generate weights only (for four variables)
#' match.out <- c('i_felony', 'i_misdemea', 'i_drugs', 'any_crime')
#'
#' # runtime: ~ 20 minutes
#' sea5 <- microsynth(seattledmi,  idvar='ID', timevar='time',
#'          intvar='Intervention', match.out=match.out, match.covar=cov.var,
#'          start.pre=1, end.pre=12, end.post=16,
#'          result.var=FALSE, perm=250, jack=TRUE,
#'          n.cores = min(parallel::detectCores(), 2))
#'
#' # View weights
#' summary(sea5)
#'
#' # Generate results only
#' sea6 <- microsynth(seattledmi, idvar='ID', timevar='time',
#'           intvar='Intervention',
#'           start.pre=1, end.pre=12, end.post=c(14, 16),
#'           result.var=match.out, test='lower',
#'           w=sea5, result.file=file.path(tempdir(), 'ExResults6.xlsx'),
#'           n.cores = min(parallel::detectCores(), 2))
#'
#' # View results (including previously-found weights)
#' summary(sea6)
#'
#' # Generate plots only
#' plot_microsynth(sea6, plot.var=match.out[1:2])
#'
#' # Apply microsynth in the traditional setting of Synth
#' # Create macro-level (small n) data, with 1 treatment unit
#' set.seed(86879)
#' ids.t <- names(table(seattledmi$ID[seattledmi$Intervention==1]))
#' ids.c <- setdiff(names(table(seattledmi$ID)), ids.t)
#' ids.synth <- c(base::sample(ids.t, 1), base::sample(ids.c, 100))
#' seattledmi.one <- seattledmi[is.element(seattledmi$ID,
#'            as.numeric(ids.synth)), ]
#'
#' # Apply microsynth to the new macro-level data
#' # runtime: < 5 minutes
#' sea8 <- microsynth(seattledmi.one, idvar='ID', timevar='time',
#'            intvar='Intervention',
#'            start.pre=1, end.pre=12, end.post=16,
#'            match.out=match.out[4],
#'            match.covar=cov.var, result.var=match.out[4],
#'            test='lower', perm=250, jack=FALSE,
#'            check.feas=TRUE, use.backup=TRUE,
#'            n.cores = min(parallel::detectCores(), 2))
#'
#' # View results
#' summary(sea8)
#' plot_microsynth(sea8)
#'
#' # Use microsynth to calculate propensity score-type weights
#' # Prepare cross-sectional data at time of intervention
#' seattledmi.cross <- seattledmi[seattledmi$time==16, colnames(seattledmi)!='time']
#'
#' # Apply microsynth to find propensity score-type weights
#' # runtime: ~5 minutes
#' sea9 <- microsynth(seattledmi.cross, idvar='ID', intvar='Intervention',
#'              match.out=FALSE, match.covar=cov.var, result.var=match.out,
#'              test='lower', perm=250, jack=TRUE,
#'              n.cores = min(parallel::detectCores(), 2))
#'
#' # View results
#' summary(sea9)
#' }
#'
#' @export

microsynth <- function(data, idvar, intvar, timevar = NULL, start.pre = NULL, end.pre = NULL, end.post = NULL, match.out = TRUE, match.covar = TRUE, match.out.min = NULL, match.covar.min = NULL, result.var = TRUE,
    omnibus.var = result.var, period = 1, scale.var = "Intercept", confidence = 0.9, test = "twosided", perm = 0, jack = 0, use.survey = TRUE, cut.mse = Inf, check.feas = FALSE, use.backup = FALSE, w = NULL, max.mse = 0.01,
    maxit = 250, cal.epsilon = 1e-04, calfun = "linear", bounds = c(0, Inf), result.file = NULL, printFlag = TRUE, n.cores = TRUE, ret.stats = FALSE) {

    # Force to dataframe (e.g., not 'tibble')
    data <- as.data.frame(data)

    # Determine the number of cores to be used. CRAN tops at 2.
    n.cores <- msCluster(n.cores)

    # Declare metrics for print() call (1 of 3)
    info <- list()
    info$match <- unique(match.out)
    info$match.min <- unique(match.out.min)
    info$covar <- unique(match.covar)
    info$covar.min <- unique(match.covar.min)
    info$start.pre <- start.pre
    info$end.pre <- end.pre
    info$end.post <- end.post

    all.tmp <- proc.time()
    if (length(timevar) == 0) {
        if (length(table(data[, idvar])) < NROW(data)) {
            stop("Data are not cross-sectional.  Please specify timevar.")
        } else {
            data$Time <- 1
            timevar <- "Time"
            end.pre <- 1
        }
    }
    time.tmp <- data[, timevar]
    time.names <- names(table(time.tmp))
    data[, timevar] <- match(as.character(time.tmp), time.names)

    # Declare more metrics for print() call (2 of 3)
    info$nUnits <- length(unique(data[[idvar]]))  # num units
    info$nTreatment <- length(unique(data[idvar][data[intvar] == 1]))
    info$nControl <- info$nUnits - info$nTreatment

    # Set up dummy variables for use in determining values of indirectly stated inputs
    if (length(start.pre) > 0 & !is.logical(start.pre)) {
        start.pre <- match(as.character(start.pre), time.names)
    }
    if (length(end.pre) > 0) {
        end.pre <- match(as.character(end.pre), time.names)
    }
    if (length(end.post) > 0 & !is.logical(end.post)) {
        end.post <- match(as.character(end.post), time.names)
    }
    twosided <- TRUE
    if (test == "lower" | test == "upper") {
        twosided <- FALSE
    }
    if (is.logical(match.out)) {
        if (!match.out) {
            reset.match.out <- TRUE
        } else {
            reset.match.out <- FALSE
        }
        match.out <- NULL
    } else if (length(match.out) == 0) {
        reset.match.out <- TRUE
    } else {
        reset.match.out <- FALSE
    }
    if (is.logical(match.covar)) {
        if (!match.covar) {
            reset.match.covar <- TRUE
        } else {
            reset.match.covar <- FALSE
        }
        match.covar <- NULL
    } else if (length(match.covar) == 0) {
        reset.match.covar <- TRUE
    } else {
        reset.match.covar <- FALSE
    }
    if (is.logical(result.var)) {
        if (!result.var) {
            reset.result.var <- TRUE
        } else {
            reset.result.var <- FALSE
        }
        result.var <- NULL
    } else if (length(result.var) == 0) {
        reset.result.var <- TRUE
    } else {
        reset.result.var <- FALSE
    }
    match.out <- remove.vars(match.out, dimnames(data)[[2]], "match.out", printFlag = printFlag)
    match.out.min <- remove.vars(match.out.min, dimnames(data)[[2]], "match.out.min", printFlag = printFlag)
    match.covar <- remove.vars(match.covar, dimnames(data)[[2]], "match.covar", printFlag = printFlag)
    match.covar.min <- remove.vars(match.covar.min, dimnames(data)[[2]], "match.covar.min", printFlag = printFlag)
    result.var <- remove.vars(result.var, dimnames(data)[[2]], "out.covar", printFlag = printFlag)
    if (!is.logical(omnibus.var)) {
        omnibus.var <- remove.vars(omnibus.var, dimnames(data)[[2]], "omnibus.var", printFlag = printFlag)
    }
    nv.names <- union(match.covar, match.covar.min)
    v.names <- result.var
    if (length(match.out) > 0) {
        # v.names <- union(v.names, names(match.out))
        if (is.list(match.out)) {
            v.names <- union(v.names, names(match.out))
        } else {
            v.names <- union(v.names, match.out)
        }
    }
    if (length(match.out.min) > 0) {
        if (is.list(match.out.min)) {
            v.names <- union(v.names, names(match.out.min))
        } else {
            v.names <- union(v.names, match.out.min)
        }
    }
    ok.cl <- c("numeric", "integer", "logical")
    classes <- NA
    for (i in 1:NCOL(data)) {
        classes[i] <- class(data[, i])
    }
    ok.col <- is.element(classes, ok.cl)
    rm.col <- colnames(data)[!ok.col]
    rm.col <- setdiff(rm.col, c(idvar, timevar, intvar))
    data <- data[, !is.element(colnames(data), rm.col)]
    v.names <- setdiff(v.names, rm.col)
    nv.names <- setdiff(nv.names, rm.col)

    # Shape panel data into 3D array, generate intervention matrix, etc.
    data <- newreshape(data, nv.names = nv.names, v.names = v.names, timevar = timevar, idvar = idvar, intvar = intvar)
    if (length(result.var) == 0) {
        result.var <- data[[4]]
        if (!reset.result.var) {
            if (printFlag) {
                message("result.var = TRUE.  Resetting: \n", appendLF = FALSE)
            }
            if (printFlag) {
                message("result.var = c(\"", paste(result.var, collapse = "\",\"", sep = ""), "\")\n\n", sep = "", appendLF = FALSE)
            }
        }
    }
    if (length(match.covar) == 0) {
        match.covar <- data[[3]]
        if (!reset.match.covar) {
            if (printFlag) {
                message("match.covar = TRUE.  Resetting: \n", appendLF = FALSE)
            }
            if (printFlag) {
                message("match.covar = c(\"", paste(match.covar, collapse = "\",\"", sep = ""), "\")\n\n", sep = "", appendLF = FALSE)
            }
        }
    }
    Intervention <- data[[2]]
    data <- data[[1]]
    times <- as.numeric(colnames(Intervention))
    if (length(end.pre) == 0) {
        eval.times <- which(colSums(Intervention) != 0)
        if (length(eval.times) == 0) {
            stop("There are no intervention cases.\n")
        }
        end.pre <- min(eval.times) - 1
        if (end.pre == 0) {
            stop("There are no pre-intervention time points.")
        }
        end.pre <- times[end.pre]
        if (printFlag) {
            message("Setting end.pre = ", time.names[end.pre], ".\n\n", sep = "", appendLF = FALSE)
        }
    }
    Intervention[] <- as.integer(rowSums(Intervention) != 0)
    if (length(start.pre) == 0) {
        start.pre <- min(times)
    }
    if (length(end.post) == 0) {
        end.post <- max(times)
    }
    if (length(match.out) == 0) {
        match.out <- result.var
        if (!reset.match.out) {
            if (printFlag) {
                message("match.out = TRUE.  Resetting: \n", appendLF = FALSE)
            }
            if (printFlag) {
                message("match.out = c(\"", paste(match.out, collapse = "\",\"", sep = ""), "\")\n\n", sep = "", appendLF = FALSE)
            }
        }
    }
    if (!is.list(match.out) & length(match.out) > 0) {
        match.out.tmp <- match.out
        match.out <- list()
        for (i in 1:length(match.out.tmp)) {
            match.out[[i]] <- rep(period, (end.pre - start.pre + 1)%/%period)
        }
        names(match.out) <- match.out.tmp
    } else if (length(match.out) > 0) {
        match.out <- check.matchout(match.out, end.pre - min(times) + 1)
        if (match.out[[2]]) {
            if (printFlag) {
                message("WARNING: match.out calls on time periods that are beyond the data range.\n", sep = "", appendLF = FALSE)
            }
            if (printFlag) {
                message("match.out is being reset accordingly.\n\n", sep = "", appendLF = FALSE)
            }
        }
        match.out <- match.out[[1]]
    }
    if (!is.list(match.out.min) & length(match.out.min) > 0) {
        match.out.tmp1 <- match.out.min
        match.out.min <- list()
        for (i in 1:length(match.out.tmp1)) {
            match.out.min[[i]] <- rep(period, (end.pre - start.pre + 1)%/%period)
        }
        names(match.out.min) <- match.out.tmp1
    } else if (length(match.out.min) > 0) {
        match.out.min <- check.matchout(match.out.min, end.pre - min(times) + 1)
        if (match.out.min[[2]]) {
            if (printFlag) {
                message("WARNING: match.out.min calls on time periods that are beyond the data range. \n", sep = "", appendLF = FALSE)
            }
            if (printFlag) {
                message("match.out.min is being reset accordingly.\n\n", sep = "", appendLF = FALSE)
            }
        }
        match.out.min <- match.out.min[[1]]
    }
    if (is.logical(omnibus.var)) {
        if (omnibus.var) {
            if (length(match.out) == 1) {
                omnibus.var <- NULL
            } else {
                omnibus.var <- result.var
                if (printFlag) {
                  message("omnibus.var = TRUE.  Resetting: \n", appendLF = FALSE)
                }
                if (printFlag) {
                  message("omnibus.var = c(\"", paste(omnibus.var, collapse = "\",\"", sep = ""), "\")\n\n", sep = "", appendLF = FALSE)
                }
            }
        } else {
            omnibus.var <- NULL
        }
    }
    if (length(omnibus.var) == 1) {
        warning("If non-NULL, omnibus.var should have length of at least 2. It is being reset to NULL.")
        omnibus.var <- NULL
    }
    if (length(setdiff(omnibus.var, result.var)) > 0) {
        stop("omnibus.var contains elments not in results.var.")
    }
    int.num <- 1
    dum <- max(colSums(Intervention == 1))
    dum <- min(NROW(Intervention) - dum, dum)
    dum1 <- ((max(end.post) - (max(end.post) - end.pre)%%period - end.pre)/period)
    if (dum <= dum1 + 1) {
        if (printFlag) {
            message("WARNING: There is a low number (", dum, ") of cases in the treatment or intervention group.\n", sep = "", appendLF = FALSE)
        }
        if (jack > 0) {
            jack <- 0
            if (printFlag) {
                message("setting jack = 0.\n", appendLF = FALSE)
            }
        }
        if (use.survey) {
            use.survey <- FALSE
            if (printFlag) {
                message("Setting use.survey = FALSE.\n", appendLF = FALSE)
            }
        }
        if (printFlag) {
            message("Be cautious of results involving linearization or confidence intervals.\n\n", appendLF = FALSE)
        }
    }
    if (reset.match.covar) {
        match.covar <- NULL
    }
    if (reset.match.out) {
        match.out <- NULL
    }
    if (length(rm.col) > 0) {
        for (i in 1:length(rm.col)) {
            if (is.element(rm.col[i], result.var)) {
                if (printFlag) {
                  message("WARNING: ", rm.col[i], " is a non-numeric variable.  It will be removed from result.var. \n", sep = "", appendLF = FALSE)
                }
                result.var <- setdiff(result.var, rm.col[i])
            }
            if (is.element(rm.col[i], omnibus.var)) {
                if (printFlag) {
                  message("WARNING: ", rm.col[i], " is a non-numeric variable.  It will be removed from omnibus.var. \n", sep = "", appendLF = FALSE)
                }
                omnibus.var <- setdiff(omnibus.var, rm.col[i])
            }
            if (is.element(rm.col[i], match.covar)) {
                if (printFlag) {
                  message("WARNING: ", rm.col[i], " is a non-numeric variable.  It will be removed from match.covar. \n", sep = "", appendLF = FALSE)
                }
                match.covar <- setdiff(match.covar, rm.col[i])
            }
            if (is.element(rm.col[i], match.covar.min)) {
                if (printFlag) {
                  message("WARNING: ", rm.col[i], " is a non-numeric variable.  It will be removed from match.covar.min. \n", sep = "", appendLF = FALSE)
                }
                match.covar.min <- setdiff(match.covar.min, rm.col[i])
            }
            if (is.element(rm.col[i], names(match.out))) {
                if (printFlag) {
                  message("WARNING: ", rm.col[i], " is a non-numeric variable.  It will be removed from match.out. \n", sep = "", appendLF = FALSE)
                }
                rm.li <- which(is.element(names(match.out), rm.col[i]))
                match.out <- match.out[-rm.li]
            }
            if (is.element(rm.col[i], names(match.out.min))) {
                if (printFlag) {
                  message("WARNING: ", rm.col[i], " is a non-numeric variable.  It will be removed from match.out.min. \n", sep = "", appendLF = FALSE)
                }
                rm.li1 <- which(is.element(names(match.out.min), rm.col[i]))
                match.out.min <- match.out.min[-rm.li1]
            }
        }
    }

    # Establish synthetic control weights
    if (length(w) == 0) {
        tmp <- proc.time()
        if (printFlag) {
            message("Calculating weights...", "\n", appendLF = FALSE)
        }
        w <- get.w(data, match.covar, match.covar.min, match.out, match.out.min, boot = perm, jack = jack, Int = Intervention[, as.character(end.pre)], int.val = int.num, trim = NULL, end.pre = end.pre, cal.epsilon = cal.epsilon,
            maxit = maxit, bounds = bounds, calfun = calfun, check.feas = check.feas, scale.var = scale.var, cut.mse = max.mse, use.backup = use.backup, time.names = time.names, printFlag = printFlag, n.cores = n.cores)
        tmp <- proc.time() - tmp
        if (printFlag) {
            message("Calculation of weights complete: Total time = ", round(tmp[3], 2), "\n\n", sep = "", appendLF = FALSE)
        }
    } else {
        if (printFlag) {
            message("Weights have been provided.  Will not calculate weights.\n", appendLF = FALSE)
        }
        if (methods::is(w) == "microsynth") {
            w <- w$w
        }
        is.correct.w <- is.list(w)
        if (is.correct.w) {
            is.correct.w <- is.correct.w & sum(names(w) != c("Weights", "Intervention", "MSE", "Model", "Summary", "keep.groups")) == 0
        }
        if (is.correct.w) {
            is.correct.w <- is.correct.w & dim(w$Weights)[1] == dim(data)[1]
        }
        if (is.correct.w) {
            is.correct.w <- is.correct.w & dim(w$Intervention)[1] == dim(data)[1]
        }
        if (!is.correct.w) {
            stop("w is not formatted correctly.")
        }
        perm <- sum(grepl("Perm", colnames(w$Weights)))
        jack <- sum(grepl("Jack", colnames(w$Weights)))
        if (printFlag) {
            message("Setting jack = ", jack, "\n", sep = "", appendLF = FALSE)
        }
        if (printFlag) {
            message("Setting perm = ", perm, "\n\n", sep = "", appendLF = FALSE)
        }
    }
    not.jack <- !grepl("Jack", colnames(w$Weights))
    end.post <- end.post - (end.post - end.pre)%%period
    if (!reset.result.var) {
        stats <- list()
        stats1 <- list()
        stats2 <- list()
        delta.out <- list()
        dof <- list()
        out.coefs <- list()
        results <- list()
        for (i in 1:length(end.post)) {
            tmp <- proc.time()
            is.graph <- "Calculating basic statistics"
            is.graph1 <- "Completed calculation of basic statistics"
            if (printFlag) {
                message(is.graph, " for end.post = ", time.names[end.post[i]], "...", "\n", sep = "", appendLF = FALSE)
            }

            # Calculate basic statistics
            stats[[i]] <- get.stats(data, w$Weights, w$Intervention, w$keep.groups, result.var, end.pre = end.pre, period = period, end.post = end.post[i], omnibus.var = omnibus.var, start.pre = start.pre, cut.mse = cut.mse,
                twosided = twosided, time.names = time.names)
            if (i == which.max(end.post)) {
                plot.stats <- stats[[i]][[5]]
            }
            stats[[i]] <- stats[[i]][-5]
            tmp <- proc.time() - tmp
            if (printFlag) {
                message(is.graph1, " for end.post = ", time.names[end.post[i]], ".  Time = ", round(tmp[3], 2), "\n\n", sep = "", appendLF = FALSE)
            }
            if (!reset.result.var) {
                w.tmp <- w$Weights
                Inter.tmp <- w$Intervention
                mse.tmp <- w$keep.groups
                if (!use.survey) {
                  keep.surv <- !grepl("Perm", colnames(w.tmp))
                  w.tmp <- w.tmp[, keep.surv, drop = FALSE]
                  Inter.tmp <- Inter.tmp[, keep.surv, drop = FALSE]
                  mse.tmp <- mse.tmp[keep.surv]
                }
                tmp <- proc.time()
                if (printFlag) {
                  message("Calculating survey statistics for end.post = ", time.names[end.post[i]], "...", "\n", sep = "", appendLF = FALSE)
                }

                # Calculate complex (i.e., survey) statistics
                stats.tmp <- get.stats1(data, w.tmp, Inter.tmp, mse.tmp, result.var, end.pre = end.pre, period = period, end.post = end.post[i], omnibus.var = omnibus.var, twosided = twosided, printFlag = printFlag,
                  n.cores = n.cores)
                stats1[[i]] <- stats.tmp[[1]]
                stats2[[i]] <- stats.tmp[[2]]
                delta.out[[i]] <- stats.tmp[[3]]
                dof[[i]] <- stats.tmp[[4]]
                out.coefs[[i]] <- stats.tmp[[5]]
                tmp <- proc.time() - tmp
                if (printFlag) {
                  message("Completed calculation of survey statistics for end.post = ", time.names[end.post[i]], ".  Time = ", round(tmp[3], 2), "\n\n", sep = "", appendLF = FALSE)
                }
                Pct.Chng <- cbind(Pct.Chng = stats[[i]][[2]][1, ])
                if (is.element("Omnibus", rownames(Pct.Chng))) {
                  Pct.Chng["Omnibus", ] <- NA
                }
                Trt <- cbind(Trt = stats[[i]][[3]][1, ])
                Con <- cbind(Con = stats[[i]][[4]][1, ])
                synth.stats1 <- stats1[[i]]
                synth.stats2 <- stats2[[i]]
                synth.delta.out <- delta.out[[i]]
                synth.dof <- dof[[i]]
                if (test == "lower") {
                  Linear.pVal <- cbind(Linear.pVal = stats::pnorm(synth.stats1[1, ], lower.tail = TRUE))
                } else if (test == "upper") {
                  Linear.pVal <- cbind(Linear.pVal = stats::pnorm(synth.stats1[1, ], lower.tail = FALSE))
                } else {
                  Linear.pVal <- cbind(Linear.pVal = 2 * stats::pnorm(abs(synth.stats1[1, ]), lower.tail = FALSE))
                  if (is.element("Omnibus", rownames(Linear.pVal))) {
                    Linear.pVal["Omnibus", ] <- stats::pchisq(synth.stats1[1, "Omnibus"], df = synth.dof, lower.tail = FALSE)
                  }
                }
                Linear.CI <- make.ci(synth.stats2[1, ], sqrt(synth.delta.out[1, ]), alpha = 1 - confidence)
                colnames(Linear.CI) <- paste("Linear.", colnames(Linear.CI), sep = "")
                Jack.pVal <- Jack.CI <- Perm.pVal <- Perm.CI <- NULL
                if (jack > 0) {
                  jack.stats1 <- synth.stats1[2, ]
                  jack.stats2 <- synth.stats2[2, ]
                  jack.delta.out <- synth.delta.out[2, ]
                  synth.stats1 <- synth.stats1[-2, , drop = FALSE]
                  synth.stats2 <- synth.stats2[-2, , drop = FALSE]
                  synth.delta.out <- synth.delta.out[-2, , drop = FALSE]
                  if (test == "lower") {
                    Jack.pVal <- cbind(Jack.pVal = stats::pnorm(jack.stats1, lower.tail = TRUE))
                  } else if (test == "upper") {
                    Jack.pVal <- cbind(Jack.pVal = stats::pnorm(jack.stats1, lower.tail = FALSE))
                  } else {
                    Jack.pVal <- cbind(Jack.pVal = 2 * stats::pnorm(abs(jack.stats1), lower.tail = FALSE))
                    if (is.element("Omnibus", rownames(Jack.pVal))) {
                      Jack.pVal["Omnibus", ] <- stats::pchisq(c(jack.stats1)["Omnibus"], df = synth.dof, lower.tail = FALSE)
                    }
                  }
                  Jack.CI <- make.ci(jack.stats2, sqrt(jack.delta.out), alpha = 1 - confidence)
                  colnames(Jack.CI) <- paste("Jack.", colnames(Jack.CI), sep = "")
                }
                if (perm > 0 & use.survey) {
                  perm.stats1 <- get.pval(list(synth.stats1), ret.na = TRUE)
                  dum <- perm.stats1[[2]]
                  perm.stats1 <- perm.stats1[[1]]
                  if (test == "lower") {
                    Perm.pVal <- cbind(Perm.pVal = c(perm.stats1))
                  } else if (test == "upper") {
                    Perm.pVal <- cbind(Perm.pVal = 1 - c(perm.stats1))
                  } else {
                    Perm.pVal <- cbind(Perm.pVal = 2 * apply(cbind(c(perm.stats1), 1 - c(perm.stats1)), 1, min))
                    if (is.element("Omnibus", rownames(Perm.pVal))) {
                      Perm.pVal["Omnibus", ] <- 1 - c(perm.stats1)["Omnibus"]
                    }
                  }
                  Perm.CI <- make.ci2(synth.stats2, synth.delta.out, alpha = 1 - confidence)
                  colnames(Perm.CI) <- paste("Perm.", colnames(Perm.CI), sep = "")
                } else if (perm > 0) {
                  perm.stats = get.pval(stats[[i]], ret.na = TRUE)
                  dum <- perm.stats[[2]]
                  perm.stats <- (perm.stats[[1]])[c(1), , drop = FALSE]
                  if (test == "lower") {
                    Perm.pVal <- cbind(Perm.pVal = c(perm.stats[1, ]))
                  } else if (test == "upper") {
                    Perm.pVal <- cbind(Perm.pVal = 1 - c(perm.stats[1, ]))
                  } else {
                    Perm.pVal <- cbind(Perm.pVal = 2 * pmin(c(perm.stats[1, ]), 1 - c(perm.stats[1, ])))
                    if (is.element("Omnibus", rownames(Perm.pVal))) {
                      Perm.pVal["Omnibus", ] <- 1 - c(perm.stats[1, ])["Omnibus"]
                    }
                  }
                }
                results[[i]] <- cbind(Trt, Con, Pct.Chng, Linear.pVal, Linear.CI, Jack.pVal, Jack.CI, Perm.pVal, Perm.CI)
                if (NROW(results[[i]]) == 1) {
                  rownames(results[[i]]) <- result.var[1]
                }
                names(results)[i] <- names(out.coefs)[i] <- time.names[end.post[i]]
            }
        }
    }
    if (reset.result.var) {
        if (printFlag) {
            message("No outcome variables specified (e.g., result.var = NULL).\n", appendLF = FALSE)
        }
        if (printFlag) {
            message("Results will not be tabulated.\n", appendLF = FALSE)
        }
    }
    if (reset.result.var) {
        if (printFlag) {
            message("Returning weights only.\n", appendLF = FALSE)
        }
    }

    # Tabulate results for output.
    out <- list()
    i <- 1
    out[[i]] <- w
    names(out)[i] <- "w"
    i <- i + 1
    if (!reset.result.var) {
        out[[i]] <- results
        names(out)[i] <- "Results"
        out[[i + 1]] <- out.coefs
        names(out)[i + 1] <- "svyglm.stats"
        i <- i + 2
    }
    if (!reset.result.var) {
        out[[i]] <- plot.stats
        names(out)[i] <- "Plot.Stats"
        i <- i + 1
    }
    if (ret.stats) {
        out[[i]] <- stats
        names(out)[i] <- "stats"
        out[[i + 1]] <- stats1
        out[[i + 2]] <- stats2
        out[[i + 3]] <- delta.out
        names(out)[i:(i + 3)] <- c("stats", "stats1", "stats2", "delta.out")
        i <- i + 4
    }
    if (length(result.file) > 0 & !reset.result.var) {
        out.results(results, end.pre, end.post = names(results), result.file, printFlag = printFlag)
    }
    all.tmp <- proc.time() - all.tmp
    if (printFlag) {
        message("microsynth complete: Overall time = ", round(all.tmp[3], 2), "\n\n", sep = "", appendLF = FALSE)
    }

    # Declare final output for print (3 of 3)
    info$nConstraints <- nrow(out$w$Summary) - 1
    info$num.constr <- out$w$num.constr
    out$w$num.constr <- NULL

    # Add descriptive stats to output for print() call
    out$info <- info

    out <- makemicrosynth(out)
    return(out)
}



# Sub-function of microsynth(); calculate p-values when permutation is used
get.pval <- function(stats, p = NCOL(stats[[1]]), k = length(stats), ret.na = FALSE) {
    nams <- colnames(stats[[1]])
    n.boot <- NROW(stats[[1]]) - 1
    out <- matrix(NA, k, p)
    colnames(out) <- nams
    rownames(out) <- paste("Stat", 1:k, sep = "")

    dum <- list()
    for (i in 1:k) {
        synth <- stats[[i]][1, ]
        synth.tmp <- t(matrix(synth, p, n.boot))
        boot <- stats[[i]][-1, , drop = FALSE]
        boot[boot == Inf | boot == -Inf] <- NA
        dum[[i]] <- colSums(is.na(boot))
        out[i, ] <- colMeans(synth.tmp > boot, na.rm = TRUE)
    }
    if (ret.na) {
        return(list(out, dum))
    } else {
        return(out)
    }
}



# Sub-function of microsynth(); reshape data to make 3D array with other necessary outputs
newreshape <- function(data, timevar, idvar, intvar, v.names = NULL, nv.names = NULL) {
    times = names(table(data[, timevar]))
    if (length(v.names) == 0) {
        v.names1 <- v.names <- setdiff(colnames(data), c(nv.names, timevar, idvar))
    } else {
        v.names1 <- c(intvar, v.names)
    }
    keep.nams <- c(idvar, timevar, intvar, nv.names, v.names)
    keep.nams <- names(table(keep.nams))
    data <- data[, keep.nams]

    if (length(times) > 1) {
        newdat <- stats::reshape(data, timevar = timevar, idvar = idvar, v.names = v.names1, direction = "wide")
    } else if (length(times) == 1) {
        newdat <- data[, colnames(data) != timevar]
        append.cols <- setdiff(colnames(newdat), c(idvar, nv.names))
        append.cols <- which(is.element(colnames(newdat), append.cols))
        colnames(newdat)[append.cols] <- paste(colnames(newdat)[append.cols], ".", times, sep = "")
    }
    intcols <- substr(colnames(newdat), 1, nchar(intvar)) == intvar
    intcols <- intcols & (substr(colnames(newdat), nchar(intvar) + 1, nchar(intvar) + 1) == "" | substr(colnames(newdat), nchar(intvar) + 1, nchar(intvar) + 1) == ".")
    intcols <- colnames(newdat)[intcols]
    int <- newdat[, intcols, drop = FALSE]
    colnames(int) <- gsub(paste(intvar, ".", sep = ""), "", colnames(int))
    intcols1 <- as.numeric(colnames(int))
    int <- int[, order(intcols1, decreasing = FALSE), drop = FALSE]
    rownames(int) <- newdat[, idvar]

    v.names <- setdiff(v.names, intvar)
    nv.names <- setdiff(nv.names, intvar)

    newdat <- newdat[, setdiff(colnames(newdat), intcols)]

    out <- array(NA, c(NROW(newdat), length(nv.names) + length(v.names), NCOL(int)))
    dimnames(out) <- list(newdat[, idvar], c(nv.names, v.names), colnames(int))

    for (i in 1:dim(out)[2]) {
        nam.tmp <- dimnames(out)[[2]][i]
        here <- substr(colnames(newdat), 1, nchar(nam.tmp)) == nam.tmp
        here <- here & (substr(colnames(newdat), nchar(nam.tmp) + 1, nchar(nam.tmp) + 1) == "." | substr(colnames(newdat), nchar(nam.tmp) + 1, nchar(nam.tmp) + 1) == "")
        tmp <- newdat[, here, drop = FALSE]
        if (sum(here, na.rm = TRUE) > 1) {
            colnames(tmp) <- gsub(paste(dimnames(out)[[2]][i], ".", sep = ""), "", colnames(tmp))
            tmpcols1 <- as.numeric(colnames(tmp))
            tmp <- tmp[, order(tmpcols1, decreasing = FALSE)]
        }
        out[, i, ] <- as.matrix(tmp)
    }

    nv.names <- NULL
    if (dim(out)[3] > 1) {
        for (i in 1:dim(out)[2]) {
            diff.out <- out[, i, ] - rowMeans(out[, i, ])
            diff.out <- rowMeans(diff.out^2)
            if (sum(diff.out, na.rm = TRUE) == 0) {
                nv.names <- c(nv.names, dimnames(out)[[2]][i])
            }
        }
    } else {
        nv.names <- dimnames(out)[[2]]
    }

    v.names <- setdiff(dimnames(out)[[2]], nv.names)

    return(list(bigdat = out, Intervention = int, nv.names = nv.names, v.names = v.names))
}



# Sub-function of microsynth(); create results file (e.g., as CSV or XLSX)
out.results <- function(results, end.pre, end.post = names(results), file = NULL, printFlag = TRUE) {
    use.xlsx <- length(end.post) > 1

    if (substr(file, nchar(file) - 3, nchar(file)) == ".csv") {
        file <- substr(file, 1, nchar(file) - 4)
        if (use.xlsx) {
            if (printFlag) {
                message("WARNING: Cannot return .csv file since length(end.post) > 1.  Returning .xlsx file instead.\n", sep = "", appendLF = FALSE)
            }
        }
    } else if (substr(file, nchar(file) - 3, nchar(file)) == ".xls") {
        file <- substr(file, 1, nchar(file) - 4)
        use.xlsx <- TRUE
    } else if (substr(file, nchar(file) - 4, nchar(file)) == ".xlsx") {
        file <- substr(file, 1, nchar(file) - 5)
        use.xlsx <- TRUE
    }

    if (use.xlsx) {
        xlsx.loaded <- is.element("xlsx", loadedNamespaces())
        file <- paste(file, ".xlsx", sep = "")
        if (!requireNamespace("xlsx", quietly = TRUE)) {
            stop("The xlsx package is needed when saving output with multiple post-intervention times, or when the file name specifies a .xlsx extension. Please install xlsx, or if you'd like to write to .csv, only select a single post-intervention time and append the filename appropriately.",
                call. = FALSE)
        }
        if (!xlsx.loaded) {
            attachNamespace("xlsx")
        }
        end.post <- as.numeric(end.post)
        wb <- xlsx::createWorkbook()
        cspValColumn <- xlsx::CellStyle(wb, dataFormat = xlsx::DataFormat("0.0000"))
        csOtherColumn <- xlsx::CellStyle(wb, dataFormat = xlsx::DataFormat("0.00"))
        csPercColumn <- xlsx::CellStyle(wb, dataFormat = xlsx::DataFormat("0.0%"))

        for (i in 1:length(end.post)) {
            out <- results[[i]]
            keep <- rowMeans(is.na(out)) < 1
            out <- out[keep, , drop = FALSE]

            is.pct <- grepl("pct", tolower(colnames(out))) | grepl("lower", tolower(colnames(out))) | grepl("upper", tolower(colnames(out)))
            is.pval <- grepl("pval", tolower(colnames(out)))
            is.oth <- !is.pct & !is.pval

            pct.list <- rep(list(csPercColumn), sum(is.pct))
            names(pct.list) <- as.character(which(is.pct))
            pval.list <- rep(list(cspValColumn), sum(is.pval))
            names(pval.list) <- as.character(which(is.pval))
            oth.list <- rep(list(csOtherColumn), sum(is.oth))
            names(oth.list) <- as.character(which(is.oth))
            dum1 <- which(rownames(out) == "Omnibus")
            rm.omni <- c("Naive.Norm", "Pct.Chng")
            dum2 <- which(is.element(colnames(out), rm.omni))
            if (length(dum1) > 0 & length(dum2) > 0) {
                out[dum1, dum2] <- NA
            }
            p <- NCOL(out)
            n <- NROW(out)

            nam <- paste("Max time = ", end.post[i], sep = "")
            sheet <- xlsx::createSheet(wb, sheetName = nam)
            xlsx::setColumnWidth(sheet, colIndex = 1:(p + 1), colWidth = 13)
            xlsx::addDataFrame(out, sheet, colStyle = c(pct.list, pval.list, oth.list))

            bord1 <- xlsx::Border(color = "black", position = "BOTTOM", pen = "BORDER_THIN")
            cb <- xlsx::CellBlock(sheet, startRow = 1, startColumn = 1, noRows = (n + 1), noColumns = (p + 1), create = FALSE)
            xlsx::CB.setBorder(cb, border = bord1, rowIndex = 1, colIndex = 1:(p + 1))

            font <- xlsx::Font(wb, isBold = TRUE)
            xlsx::CB.setFont(cb, font = font, rowIndex = 1, colIndex = 2:(p + 1))

        }

        xlsx::saveWorkbook(wb, file = file)
        rm(wb)
    } else {
        file <- paste(file, ".csv", sep = "")
        utils::write.csv(results[[1]], file, na = "")
    }

}



# Sub-function of microsynth(); calculate confidence intervals
make.ci <- function(means, ses, alpha = 0.05) {
    z.score <- stats::qnorm(1 - alpha/2)

    lower <- means - ses * z.score
    upper <- means + ses * z.score

    out <- cbind(exp(lower) - 1, exp(upper) - 1)
    rownames(out) <- names(means)
    colnames(out) <- c("Lower", "Upper")
    return(out)
}



# Sub-function of microsynth(); calculate confidence intervals
make.ci2 <- function(stats, delta.out, alpha = 0.05) {
    means <- stats[1, ]
    sds <- sqrt(delta.out[1, ])

    z.scores <- stats/sqrt(delta.out)
    quants <- apply(z.scores, 2, stats::quantile, probs = c(alpha/2, 1 - alpha/2), na.rm = TRUE)

    lower <- means - sds * quants[2, ]
    upper <- means - sds * quants[1, ]

    out <- cbind(exp(lower) - 1, exp(upper) - 1)
    rownames(out) <- colnames(stats)
    colnames(out) <- c("Lower", "Upper")
    return(out)
}



# Sub-function of microsynth(); allow parellalization
msCluster <- function(n) {

    requireNamespace("parallel", quietly = TRUE)

    if (is.logical(n)) {
        if (n) {
            n1 <- parallel::detectCores()
            # n1 <- future::availableCores()
        } else {
            n1 <- 1
        }
    } else if (length(n) == 0) {
        n1 <- parallel::detectCores() - 1
        # n1 <- future::availableCores() - 1
    } else {
        n1 <- n
    }

    # use 2 cores if CRAN, otherwise, use number detected
    chk <- Sys.getenv("_R_CHECK_LIMIT_CORES_", "")
    cores <- ifelse(chk == TRUE, 2, n1)

    return(cores)
}



# Sub-function of microsynth()
remove.vars <- function(vars, nams, objnam = "result.var", printFlag = TRUE) {
    if (is.list(vars)) {
        vars1 <- names(vars)
    } else {
        vars1 <- vars
    }

    rm.vars <- setdiff(vars1, nams)

    if (length(rm.vars) > 0) {
        rm.here <- which(is.element(vars1, rm.vars))
        vars <- vars[-rm.here]
        if (printFlag) {
            message("WARNING: The following variables will be removed from ", objnam, " since they are not in the dataset: \n", sep = "", appendLF = FALSE)
        }
        if (printFlag) {
            message(paste(rm.vars, collapse = ", ", sep = ""), "\n\n", sep = "", appendLF = FALSE)
        }
    }

    return(vars)
}



# Sub-function of microsynth()
check.matchout <- function(match.out, end.pre) {
    bad <- FALSE
    for (i in 1:length(match.out)) {
        tmp <- match.out[[i]]
        if (end.pre < sum(tmp)) {
            bad <- TRUE
            cum.tmp <- cumsum(tmp)
            cum.tmp <- cum.tmp <= end.pre
            match.out[[i]] <- c(tmp[cum.tmp])
            if (end.pre - sum(tmp[cum.tmp]) > 0) {
                match.out[[i]] <- c(match.out[[i]], end.pre - sum(tmp[cum.tmp]))
            }
        }
    }
    return(list(match.out, bad))
}

Try the microsynth package in your browser

Any scripts or data that you put into this service are public.

microsynth documentation built on July 9, 2023, 5:53 p.m.