Nothing
#' Finds convergence clubs
#'
#' Find convergence clubs by means of Phillips and Sul clustering procedure.
#'
#' @param X dataframe containing data (preferably filtered data in order to remove business cycles).
#' Data must not contain any NA or NaN values, otherwise the clustering procedure will be stopped with an error.
#' In order for the clustering procedure to work, data must be arranged such that
#' each row represents a unit and each column represents a time period (year, month, ...)
#' @param dataCols integer vector with the column indices of the data
#' @param unit_names integer scalar indicating, if present, the index of a column
#' with codes of the units
#' @param refCol integer scalar indicating the index of the column to use for ordering
#' data
#' @param time_trim a numeric value between 0 and 1, representing the portion of
#' time periods to trim when running \emph{log-t} regression model.
#' Phillips and Sul (2007, 2009) suggest to discard the first third of the period.
#' @param HACmethod string indicating whether a Fixed Quadratic Spectral Bandwidth (\code{HACmethod="FQSB"}) or
#' an Adaptive Quadratic Spectral Bandwidth (\code{HACmethod="AQSB"}) should be used for the truncation
#' of the Quadratic Spectral kernel in estimating the \emph{log-t} regression model
#' with heteroskedasticity and autocorrelation consistent standard errors.
#' The default method is "FQSB".
#' @param cstar numeric scalar, indicating the threshold value of the sieve criterion (\eqn{c^*}{c*})
#' to include units in the detected core (primary) group (step 3 of Phillips and Sul (2007, 2009) clustering algorithm).
#' The default value is 0.
#'
#'
#' @param cstar_method a string specifying whether cstar should be mantained fixed
#' (\code{cstar_method="fixed"}) or increased iteratively until the whole club satisfies
#' the condition \eqn{tvalue>-1.65} (\code{cstar_method="incremental"}).
#' Default is \code{cstar_method="fixed"} (see Details).
#' @param cstar_increment a positive value specifying the increment to cstar,
#' only valid if \code{cstar_method="incremental"} (see Details); the default value is 0.1.
#' @param cstar_cap scalar indicating the maximum value up to which \code{cstar} can
#' be increased; the default value is 3.
#'
#'
#' @return Ad object of class \code{convergence.clubs}, containing a list of
#' Convergence Clubs, for each club a list is return with the
#' following objects: \code{id}, a vector containing the row indices
#' of the units in the club; \code{model}, a list containing information
#' about the model used to run the t-test on the units in the club;
#' \code{unit_names}, a vector containing the names of the units of the club (optional,
#' only included if parameter \code{unit_names} is given)
#'
#'
#' @details In order to investigate the presence of convergence clubs according
#' to the Phillips and Sul clustering procedure, the following steps are implemented:
#' \enumerate{
#' \item (Cross section last observation ordering):
#' Sort units in descending order according to the last panel observation of the period;
#' \item (Core group formation):
#' Run the \emph{log-t} regression for the first k units \eqn{(2 < k < N)} maximizing k
#' under the condition that t-value is \eqn{> -1.65}. In other words, chose the core group size k* as follows:
#'
#' \deqn{k^*= argmax_k \{t_k\} }{k* = argmax \, t(k) } subject to \deqn{ min\{t_k \} > -1.65}{min t(k) > 1.65}
#'
#' If the condition \eqn{t_k >-1.65}{t(k) > -1.65} does not hold for \eqn{k = 2} (the first two units),
#' drop the first unit and repeat the same procedure. If \eqn{t_k >-1.65}{t(k) > -1.65} does not hold
#' for any units chosen, the whole panel diverges;
#' \item (Sieve the data for club membership): After the core group is detected,
#' run the \emph{log-t} regression for the core group adding (one by one)
#' each unit that does not belong to the latter. If \eqn{t_k }{t(k)} is greater than a critical value \eqn{c^*}{c*}
#' add the new unit in the convergence club.
#' All these units (those included in the core group \eqn{k^*}{k*} plus those added) form the first convergence club.
#' Note that Phillips and Sul (2007) suggest to make sure \eqn{t_k>-1.65}{t(k) > -1.65}
#' for the subconvergence group obtained. Otherwise, repeat the procedure by
#' increasing the value of the \eqn{c^*}{c*} parameter until the condition
#' \eqn{t_k>-1.65}{t(k) > -1.65} is satisfied for the subconvergence group;
#'
#' \item (Recursion and stopping rule): If there are units for which the previous
#' condition fails (\eqn{t_k <c^*}{t(k) < c*}), gather all these units in one
#' group and run the \emph{log-t} test to see if the condition \eqn{t_k >-1.65}{t(k) > -1.65}
#' holds. If the condition is satisfied, conclude that there are two convergence clubs.
#' Otherwise, step 1 to 3 should be repeated on the same group to determine
#' whether there are other subgroups that constitute convergence clubs.
#' If no k in step 2 satisfies the condition \eqn{t_k >-1.65}{t(k) > -1.65},
#' the remaining units diverge.
#' }
#'
#'
#'
#' Note that the clustering procedure may return groups with \eqn{t_k <-1.65}{t(k) < -1.65},
#' which are not really convergence clubs. In this case, following step 3 of
#' the clustering procedure there are two options:
#' (i) allow an iterative increase
#' of the cstar parameter until the subconvergence club satisfies the condition
#' \eqn{t_k >-1.65}{t(k) > -1.65}. In this case it should the argument \code{cstar_method}
#' should be set to \code{"incremental"} and a positive argument for the \code{cstar_increment}
#' argument should be chosen;
#' (ii) increase the value of the \code{cstar} in order to increase the discriminatory
#' power of the \emph{log-t} test in the formation of each club.
#'
#' Information about clubs, divergent units and the \eqn{c^*}{c*} used for each club can be
#' easily displayed by means of the \code{summary()} function, for which the package provides
#' a specific method for the \code{convergence.clubs} class.
#'
#'
#'
#'
#'
#' @references
#'
#' Andrews, D. W., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica: Journal of the Econometric Society, 817-858.
#'
#' Phillips, P. C.; Sul, D., 2007. Transition modeling and econometric convergence tests. Econometrica 75 (6), 1771-1855.
#'
#' Phillips, P. C.; Sul, D., 2009. Economic transition and growth. Journal of Applied Econometrics 24 (7), 1153-1185.
#'
#'
#'
#'
#' @seealso
#' \code{\link{mergeClubs}}, Merges a list of clubs created by \code{findClubs};
#'
#' \code{\link{mergeDivergent}}, Merges divergent units according to the algorithm proposed by von Lyncker and Thoennessen (2017)
#'
#'
#'
#' @examples
#' data("filteredGDP")
#'
#'
#' # Cluster Countries using GDP from year 1970 to year 2003
#' clubs <- findClubs(filteredGDP, dataCols=2:35, unit_names = 1, refCol=35,
#' time_trim = 1/3, HACmethod = "FQSB",
#' cstar = 0,
#' cstar_method = 'incremental',
#' cstar_increment = 0.1)
#'
#'
#'
#' \dontrun{
#' # Cluster Countries using GDP from year 1970 to year 2003
#' clubs <- findClubs(filteredGDP, dataCols=2:35, unit_names = 1, refCol=35,
#' time_trim = 1/3, HACmethod = "AQSB", cstar = 0)
#'
#'}
#'
#' @export
findClubs<- function(X, #data matrix or data.frame
dataCols, #vector with column indices of data,
unit_names = NULL, #column index of units, if present
refCol, #column index of year to be used as reference (lastT)
time_trim = 1/3, #portion of years to remove from computations (a value between >0 and <1)
HACmethod = c('FQSB','AQSB'),
cstar = 0, #c* value for the second step,
cstar_method = c('fixed', 'incremental'),
cstar_increment = 0.1,
cstar_cap = 3){
### Initialise variables ---------------------------------------------------
HACmethod <- match.arg(HACmethod)
cstar_method <- match.arg(cstar_method)
returnNames <- switch(class(unit_names),
NULL = FALSE,
numeric = TRUE,
integer = TRUE,
stop("Not a valid value for argument 'unit_names'; it should be an integer"))
N <- nrow(X)
t <- length(dataCols)
threshold <- -1.65
#output
clubs <- structure(list(),
class = c("convergence.clubs", "list"),
data = X,
dataCols = dataCols,
unit_names = unit_names,
refCol = refCol,
time_trim = time_trim,
HACmethod = HACmethod,
cstar = cstar,
cstar_method = cstar_method,
cstar_increment = cstar_increment
)
### Check inputs -----------------------------------------------------------
#unit_names
if(length(unit_names) > 1) stop("Argument 'unit_names' must be an integer-valued scalar")
if( returnNames){ if(unit_names %% 1 != 0) stop(" Argument 'unit_names' must be an integer-valued scalar")}
#X
if(!is.data.frame(X)) stop('X must be an object of class data.frame')
X[,unit_names] <- as.character(X[,unit_names])
#dataCols
if(!all(apply(X[,dataCols],2,is.numeric)) ) stop('Some of the data columns are non-numeric')
#length of time series
if(t < 2) stop('At least two time periods are needed to run this procedure')
#time_trim
if( length(time_trim) > 1 | !is.numeric(time_trim) ) stop('time_trim must be a numeric scalar')
if( time_trim > 1 | time_trim <= 0 ) stop('invalid value for time_trim; should be a value between 0 and 1')
if( (t - round(t*time_trim)) < 2) stop('either the number of time periods is too small or the value of time_trim is too high')
#refCol
if( length(refCol) > 1 ) stop('refCol must be an integer-valued scalar')
if( !is.numeric(refCol) ) stop('refCol must be an integer value indicating the column number of reference year')
if( refCol %% 1 != 0 ) stop('refCol must be an integer-valued scalar')
if( refCol > ncol(X) ) stop('Wrong refCol value; there is no such column')
#cstar
if(!is.numeric(cstar) | length(cstar) > 1) stop('cstar must be a numeric scalar!')
if(cstar_method=='incremental' &
(!is.numeric(cstar_increment) | length(cstar_increment) > 1 | cstar_increment <0) ){
stop('cstar_increment must be a positive numeric scalar!')
}
if(cstar_method=='incremental' &
(!is.numeric(cstar_cap) | length(cstar_cap) > 1 | cstar_cap < 0) ){
stop('cstar_cap must be a positive numeric scalar!')
}
### Other preliminary operations -------------------------------------------
#add id column to dataset
X$id <- 1:N
#Sort data by clustering variable (decreasing)
dati <- X[order(X[,refCol],decreasing = TRUE),]
### Find clubs -------------------------------------------------------------
#Cluster procedure
l <- 1
while(TRUE){
if (nrow(dati) == 1){
clubs$divergent$id <- dati$id
break #break while loop if out of units
}else if(nrow(dati) == 0){
break
}
#Test all units
H_all <- computeH(dati[,dataCols])
mod_all <- estimateMod(H_all, time_trim, HACmethod=HACmethod)
t_all <- mod_all['tvalue']
# if tvalue > -1.65, they all form one club,
#otherwise go one with clustering
if (t_all > threshold) {
clubs[[paste('club',l,sep = '')]] <- list(
# units = as.character(dati[,IDvar]),
id = dati$id,
model = mod_all,
cstar = cstar)
break
}
#find core group (returns the row indices of units in core Group)
coreGroup <- coreG(X=dati, dataCols, time_trim, threshold,
HACmethod = HACmethod, type="max")
#if no more core groups are found, add divergent to output and return
if (identical(coreGroup, FALSE) ){
# nl <- length(clubs)
clubs$divergent$id <- dati$id
break
}
#add units to core group
convClub <- club(X = dati, dataCols, core = coreGroup, time_trim,
HACmethod = HACmethod,
cstar = cstar,
cstar_method = cstar_method,
cstar_increment = cstar_increment,
cstar_cap = cstar_cap)
# newcstar <- clubConv$model$cstar
# xidclub <- which(X[,IDvar] %in% as.character(clubConv$units))
clubs[[paste('club',l,sep = '')]] <- list( id = convClub$id,
model = convClub$model,
cstar = convClub$cstar
)
dati <- dati[-convClub$rows,]#remove the club found from the dataset
l <- l + 1
}#end of while, end of clustering
### Return -----------------------------------------------------------------
#if returnNames, then add region codes to output
if(returnNames){
for(club in names(clubs) ){
clubs[[club]]$unit_names <- X[clubs[[club]]$id, unit_names]
}
return(clubs)
}else return(clubs)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.