R/assemble_factor.R

Defines functions assemble_factor

Documented in assemble_factor

#' Assemble factor time-series from source files.
#'
#' The function \code{assemble_factor} retrieves time-series from a single
#' csv-formatted source file to assemble factors in their final form. The source
#' file  must correspond to \strong{(a)} a valid entry in the internal catalog
#' contained in the package \pkg{factorr} or \strong{(b)} a valid entry in the
#' \pkg{bindr::derived_catalog} registry object controlling factors built with
#' algebraic and/or econometric manipulations.
#'
#' The parameter \emph{nm} should preferably follow the R naming convention.
#' Note that the function internally enforces the R naming rules by calling
#' \code{nm <- make.names(nm)}, which may produce a different name from the
#' user-supplied one. See \code{base::make.names} documentation for details
#' about R naming convention.
#'
#' The parameter \emph{src_hdl} must be a valid internal catalog entry. It
#' controls the handle from which the time-series will be sourced. The function
#' \code{factorr::catalog_do('show')} generates the list of available handles
#' (see column \strong{hdl}), along with a short description and the original
#' data source (\emph{e.g.} Kenneth French Library, Federal Reserve Bank of St.
#' Louis). An error is generated if an invalid catalog entry is supplied.
#'
#' Alternatively, if the parameter \emph{src_hdl} points to a \strong{derived
#' factor}, it must map to a valid \strong{derived catalog} entry. As in the
#' case above, \emph{src_hdl} controls the handle from which the time-series
#' will be sourced. The function \code{bindr::derived_catalog_do('src_hdl')}
#' displays a table of valid entries suitable for the parameter \emph{src_hdl}.
#' An error is generated if an invalid \strong{derived catalog} entry is
#' supplied.
#'
#' It should be clear from the above remarks that the parameter \emph{src_hdl}
#' can be checked internally against two different catalogs contained either in
#' package \pkg{factorr} or in package \pkg{bindr}. The parameter
#' \emph{is_built} activate an internal dispatch mechanism routing the
#' \emph{src_hdl} parameter to the appropriate catalog. Any derived factor
#' (\emph{i.e.} produced by calling \code{build_derived_factor()}) must have
#' \code{is_built == TRUE} to be routed against the internal \strong{derived
#' catalog} object. Failure to do so will generate an error.
#'
#' The parameter \emph{asset} determines which variables will be selected from
#' the source file. The function \code{factorr::catalog_do('show_hdl_names', hdl
#' = src_hdl)}, where \emph{src_hdl} is a valid catalog entry, generates a
#' tibble object containing all the variable names associated with a given
#' \emph{src_hdl}. An error is generated if \emph{asset} does not exist in the
#' source file.
#'
#' Alternatively, if the parameter \emph{src_hdl} points to a \strong{derived
#' factor}, the parameter \emph{asset} still determines which variables will be
#' selected from the source file. However there is no function to generate a
#' tibble object containing all the variable names associated with a given
#' \emph{src_hdl}. The user must instead consult the associated audit file or
#' peek at the corresponding csv-formatted file.
#'
#' Factor times-series are assembled either from a \strong{single} time-series
#' or from a \strong{linear combination} of time-series. The former case amounts
#' to extracting \emph{asset} from the existing source \emph{src_hdl} and naming
#' the resulting factor \emph{nm}. The latter case generally involves taking two
#' variables (\emph{asset} is a string vector) from \emph{src_hdl} and combining
#' them into long and short positions. In this case \emph{trade} is an integer
#' vector comprised of either +1 or -1 representing a long and short position,
#' respectively. See examples below. Note that this package currently supports
#' only \emph{linear} combinations with \emph{trade} parameters set to either +1
#' or -1.
#'
#' Note that an assembly request has no additional constraint besides the
#' existence of a file containing all the required inputs. This leaves
#' \strong{some latitude} to build different versions of the same factor. For
#' instance, the 'Quality' factor (\emph{e.g.} operating profitability) can be
#' built using deciles or can alternatively be constructed with quintiles. The
#' latitude in defining the factor assembly does not include cases where the
#' required series are located in different files. Such a case would necessitate
#' a dedicated function called by \code{bindr::build_derived_factor()}. See
#' below for additional details.
#'
#' The latitude in designing factor expression is afforded mostly for
#' exploratory purposes. In particular, \strong{factor models} are 'locked' to
#' control their design and maintain their integrity. As a direct consequence,
#' a user can't modify an existing factor model by toggling between different
#' factor expressions. Instead, a user exploring the impact of variations in
#' factor expression would have to get the factor model output (typically a
#' tibble/table object) and \strong{affix} the factor variant. However the
#' factor model audit file would clearly document the original factor model and
#' implicitly confirm any deviation in factor definition.
#'
#' The parameter \emph{src_dir} must be a valid and existing directory. An error
#' is generated if either one of these conditions is not satisfied. The
#' combination of \emph{src_dir} and \emph{src_hdl} identifies the source file
#' location and name. An error is generated if this combination points to a
#' non-existent file object. Note also that both parameters can't have multiple
#' instances, which implies that the assembly process must operate on a
#' \strong{single file} to combine its required series. Should a factor require
#' inputs located in separate files, the function
#' \code{bindr::build_derived_factor()} should be used instead.
#'
#' Additional variables (in list \emph{arg_supp}) can be requested from the
#' source file provided that they exist. The typical use involves year, month or
#' date. An error is generated if any element of the list does not exist in the
#' source file. Note that the returned tibble object puts \emph{arg_supp} first,
#' then \emph{nm}. See examples below.
#'
#' @param nm A string representing the factor name.
#' @param src_hdl A string representing the source handle. See details.
#' @param asset A string or vector of strings, indicating which asset to source.
#'   See details.
#' @param trade An integer or vector of integers, either +1 or -1, indicating a
#'   long (+1) or short (-1) position. See details.
#' @param src_dir A string representing an existing path directory where the
#'   csv-formatted files reside.
#' @param arg_supp A list of supplementary arguments. See details.
#' @param is_built Logical value indicating if a factor in the assembly process
#' has been generated by the function \code{build_derived_factor}. See details.
#'
#' @return A tibble object comprised of \emph{arg_supp} and \emph{nm}
#'   time-series, in that order. See details.
#'
#' @examples
#' \dontrun{
#' Value factor from French-Fama 3-Factor US:
#'
#' Long position in 'hml':
#'
#' assemble_factor(nm = 'value', src_dir = '~/.../Factor Warehouse/Uncompressed',
#'              src_hdl = 'FF_3F_US_M', asset = 'hml',
#'              trade = 1, arg_supp = list('year','month'))
#' }
#'
#' @examples
#' \dontrun{
#' French-Fama Operating Profitability US:
#'
#' Short position in the lowest decile and long position
#' in the highest decile:
#'
#' assemble_factor(nm = 'profit',
#'              src_dir = '~/.../Factor Warehouse/Uncompressed/',
#'              src_hdl = 'FF_OP_US_M', asset = c('Lo.10','Hi.10' ),
#'              trade = c(-1, 1), arg_supp = list('year','month'))
#' }
#'
#' @examples
#' \dontrun{
#' Inflation factor from econometric model (hence is_built = TRUE):
#'
#' Long position in 'shock':
#'
#' assemble_factor(nm = 'inflation',
#'              src_dir = '~/.../Factor Warehouse/Uncompressed/',
#'              src_hdl = 'INFLATION__naive__US_M', asset = 'shock',
#'              trade = 1, arg_supp = list('year','month'), is_built = T)
#' }
#'
#' @importFrom rlang .data
#' @importFrom rlang :=
#' @importFrom magrittr "%>%"
#' @export
assemble_factor <- function(nm = NA, src_hdl, asset, trade = 1,
                            src_dir = NA, arg_supp = list(),
                            is_built = FALSE) {

  if(is.na(nm)){
    stop("Parameter 'nm' must be set to a character name.", call. = T)
  }
  stopifnot(is.character(nm))
  nm <- make.names(nm)

  if(is_built == TRUE){
    if(base::exists('derived_catalog',
                    mode = 'list',
                    where = rlang::current_env()) == F){
      stop(
        'The factor derived catalog does not exist in the current environmemt',
        call. = T)
    }

    parse_res <- parseBuiltdHdl(src_hdl)

    if( is.null(derived_catalog_do(
      operation = 'validate_entry', arg_supp = list(
        hdl = parse_res['hdl'],
        region = parse_res['region'],
        frequency = parse_res['frequency']))) ){
      stop(stringr::str_glue(
        src_hdl, ' is not in the derived catalog. Check Spelling.'))
    }
  } else {
    if(!(src_hdl %in% factorr::catalog_do('get')$hdl)) {
      stop(stringr::str_glue(
        src_hdl, ' is not in the catalog. Check Spelling.'))
    }
  }

  if(is.na(src_dir)){
    stop("Parameter 'src_dir' must be set to a character name.", call. = T)
  }
  stopifnot(is.character(src_dir))
  if(fs::dir_exists(src_dir) == FALSE) {
    stop(stringr::str_glue(src_dir, ' directory does not exists.'), call. = T)
  }

  file_name <- stringr::str_glue(fs::path_tidy(src_dir), '/',
                                 src_hdl, '.csv')
  if(fs::file_exists(file_name) == FALSE) {
    stop(stringr::str_glue(file_name,
                           ' file does not exists. ',
                           "Check spelling. ",
                           "Also check for missing or extra '/'"), call. = T)
  }

  obj_csv <- readr::read_csv(stringr::str_glue(src_dir, src_hdl, '.csv'))
  name_pool <- names(obj_csv)

  if( !all(asset %in% name_pool) ){
    stop('One (or more) asset is not in the source file.', call. = T)
  }

  trade <- as.integer(trade)
  if(anyNA(trade)) {
    stop("Conversion of argument 'trade' to integer produced one (or more) NA",
         call. = T)
  }
  if(!all(abs(trade) == 1)){
    stop("One (or more) trade argument is not 1 (long) or -1 (short).",
         call. = T)
  }

  if(base::length(asset) != base::length(trade)){
    stop("The number of assets and trades do not match", call. = T)
  }

  if(!rlang::is_empty(arg_supp)) {
    if(!all(purrr::map_lgl(.x = arg_supp, .f = is.character))){
      stop('All supplementary arguments must be character/string', call. = T)
    }
    if( !all(unlist(arg_supp) %in% name_pool) ){
      stop('One (or more) supplementary argument is not in the source file.',
           call. = T)
    }
  }

  N <- length(asset)
  if(N == 1){
    obj_csv <- dplyr::select(.data = obj_csv, !!unlist(arg_supp), !!asset ) %>%
      dplyr::mutate_at(.vars = asset, .funs = function(.) {. * trade}) %>%
      dplyr::rename(!!nm := .data[[asset]])
  } else {
    obj_csv <- dplyr::select(.data = obj_csv, !!unlist(arg_supp), !!asset )
    invisible(purrr::map(.x = 1:N, .f = function(i){
      obj_csv <<- dplyr::mutate_at(.tbl = obj_csv, .vars = asset[i],
                                   .funs = ~(. * trade[i]))}))
    obj_csv <- dplyr::mutate(.data = obj_csv,
                             !!nm := base::rowSums(x = obj_csv[,asset])) %>%
      dplyr::select(!!!arg_supp, !!nm)
  }

  return(obj_csv)
  }
fognyc/bindr documentation built on Dec. 4, 2020, 12:33 p.m.