assemble_factor: Assemble factor time-series from source files.

Description Usage Arguments Details Value Examples

View source: R/assemble_factor.R

Description

The function assemble_factor retrieves time-series from a single csv-formatted source file to assemble factors in their final form. The source file must correspond to (a) a valid entry in the internal catalog contained in the package factorr or (b) a valid entry in the bindr::derived_catalog registry object controlling factors built with algebraic and/or econometric manipulations.

Usage

1
2
3
4
5
6
7
8
9
assemble_factor(
  nm = NA,
  src_hdl,
  asset,
  trade = 1,
  src_dir = NA,
  arg_supp = list(),
  is_built = FALSE
)

Arguments

nm

A string representing the factor name.

src_hdl

A string representing the source handle. See details.

asset

A string or vector of strings, indicating which asset to source. See details.

trade

An integer or vector of integers, either +1 or -1, indicating a long (+1) or short (-1) position. See details.

src_dir

A string representing an existing path directory where the csv-formatted files reside.

arg_supp

A list of supplementary arguments. See details.

is_built

Logical value indicating if a factor in the assembly process has been generated by the function build_derived_factor. See details.

Details

The parameter nm should preferably follow the R naming convention. Note that the function internally enforces the R naming rules by calling nm <- make.names(nm), which may produce a different name from the user-supplied one. See base::make.names documentation for details about R naming convention.

The parameter src_hdl must be a valid internal catalog entry. It controls the handle from which the time-series will be sourced. The function factorr::catalog_do('show') generates the list of available handles (see column hdl), along with a short description and the original data source (e.g. Kenneth French Library, Federal Reserve Bank of St. Louis). An error is generated if an invalid catalog entry is supplied.

Alternatively, if the parameter src_hdl points to a derived factor, it must map to a valid derived catalog entry. As in the case above, src_hdl controls the handle from which the time-series will be sourced. The function bindr::derived_catalog_do('src_hdl') displays a table of valid entries suitable for the parameter src_hdl. An error is generated if an invalid derived catalog entry is supplied.

It should be clear from the above remarks that the parameter src_hdl can be checked internally against two different catalogs contained either in package factorr or in package bindr. The parameter is_built activate an internal dispatch mechanism routing the src_hdl parameter to the appropriate catalog. Any derived factor (i.e. produced by calling build_derived_factor()) must have is_built == TRUE to be routed against the internal derived catalog object. Failure to do so will generate an error.

The parameter asset determines which variables will be selected from the source file. The function factorr::catalog_do('show_hdl_names', hdl = src_hdl), where src_hdl is a valid catalog entry, generates a tibble object containing all the variable names associated with a given src_hdl. An error is generated if asset does not exist in the source file.

Alternatively, if the parameter src_hdl points to a derived factor, the parameter asset still determines which variables will be selected from the source file. However there is no function to generate a tibble object containing all the variable names associated with a given src_hdl. The user must instead consult the associated audit file or peek at the corresponding csv-formatted file.

Factor times-series are assembled either from a single time-series or from a linear combination of time-series. The former case amounts to extracting asset from the existing source src_hdl and naming the resulting factor nm. The latter case generally involves taking two variables (asset is a string vector) from src_hdl and combining them into long and short positions. In this case trade is an integer vector comprised of either +1 or -1 representing a long and short position, respectively. See examples below. Note that this package currently supports only linear combinations with trade parameters set to either +1 or -1.

Note that an assembly request has no additional constraint besides the existence of a file containing all the required inputs. This leaves some latitude to build different versions of the same factor. For instance, the 'Quality' factor (e.g. operating profitability) can be built using deciles or can alternatively be constructed with quintiles. The latitude in defining the factor assembly does not include cases where the required series are located in different files. Such a case would necessitate a dedicated function called by bindr::build_derived_factor(). See below for additional details.

The latitude in designing factor expression is afforded mostly for exploratory purposes. In particular, factor models are 'locked' to control their design and maintain their integrity. As a direct consequence, a user can't modify an existing factor model by toggling between different factor expressions. Instead, a user exploring the impact of variations in factor expression would have to get the factor model output (typically a tibble/table object) and affix the factor variant. However the factor model audit file would clearly document the original factor model and implicitly confirm any deviation in factor definition.

The parameter src_dir must be a valid and existing directory. An error is generated if either one of these conditions is not satisfied. The combination of src_dir and src_hdl identifies the source file location and name. An error is generated if this combination points to a non-existent file object. Note also that both parameters can't have multiple instances, which implies that the assembly process must operate on a single file to combine its required series. Should a factor require inputs located in separate files, the function bindr::build_derived_factor() should be used instead.

Additional variables (in list arg_supp) can be requested from the source file provided that they exist. The typical use involves year, month or date. An error is generated if any element of the list does not exist in the source file. Note that the returned tibble object puts arg_supp first, then nm. See examples below.

Value

A tibble object comprised of arg_supp and nm time-series, in that order. See details.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## Not run: 
Value factor from French-Fama 3-Factor US:

Long position in 'hml':

assemble_factor(nm = 'value', src_dir = '~/.../Factor Warehouse/Uncompressed',
             src_hdl = 'FF_3F_US_M', asset = 'hml',
             trade = 1, arg_supp = list('year','month'))

## End(Not run)

## Not run: 
French-Fama Operating Profitability US:

Short position in the lowest decile and long position
in the highest decile:

assemble_factor(nm = 'profit',
             src_dir = '~/.../Factor Warehouse/Uncompressed/',
             src_hdl = 'FF_OP_US_M', asset = c('Lo.10','Hi.10' ),
             trade = c(-1, 1), arg_supp = list('year','month'))

## End(Not run)

## Not run: 
Inflation factor from econometric model (hence is_built = TRUE):

Long position in 'shock':

assemble_factor(nm = 'inflation',
             src_dir = '~/.../Factor Warehouse/Uncompressed/',
             src_hdl = 'INFLATION__naive__US_M', asset = 'shock',
             trade = 1, arg_supp = list('year','month'), is_built = T)

## End(Not run)

fognyc/bindr documentation built on Dec. 4, 2020, 12:33 p.m.