compute_model_matrix: Compute model matrix

Description Usage Arguments Details Value

View source: R/3-model-matrix.R

Description

Computes the model matrix, which compiles together expectedness values from the PPM analyses as well as polynomial expansions of the continuous features.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
compute_model_matrix(
  parent_dir,
  max_sample = Inf,
  sample_seed = 1,
  poly_degree = 4L,
  na_val = 0,
  filter_corpus = NULL,
  ltm = TRUE,
  viewpoint_dir = file.path(parent_dir, "0-viewpoints"),
  ppm_dir = file.path(parent_dir, "1-ppm"),
  output_dir = file.path(parent_dir, "2-model-matrix"),
  viewpoints = read_viewpoints(viewpoint_dir),
  seq_test = list_seq_test(ppm_dir),
  allow_repeats = FALSE
)

Arguments

parent_dir

(Character scalar) The parent directory for the output files, shared with functions such as compute_viewpoints and compute_ppm_analyses. Ignored if all other directory arguments are manually specified.

max_sample

(Numeric scalar) Maximum number of events to sample for the model matrix, defaults to Inf (no downsampling). Lower values of max_sample prompt random downsampling.

sample_seed

(Integer scalar) Random seed to make the downsampling reproducible.

poly_degree

(Integer scalar) Degree of the polynomials to compute for the continuous features.

na_val

(Numeric scalar) Value to use to code for NA in the model matrix. The statistical analyses are mostly unaffected by this value.

filter_corpus

(NULL or a function) An optional function to apply to the corpus to determine which events should be retained in the model matrix. The function is applied to the corpus object saved as corpus.rds in viewpoint_dir. This corpus object takes the form of a tibble; the function should return a row-subset of this tibble.

ltm

(Logical scalar, default = TRUE) If FALSE, long-term (i.e. pretrained) PPM model outputs are excluded from the model matrix.

viewpoint_dir

(Character scalar) The directory for the already-generated output files from compute_viewpoints. The default should be correct if the user used the default dir argument in compute_viewpoints.

ppm_dir

(Character scalar) The directory for the already-generated output files from compute_ppm_analyses. The default should be correct if the user used the default dir argument in compute_ppm_analyses.

output_dir

(Character scalar) The output directory for the model matrix. Will be created if it doesn't exist already.

viewpoints

Character vector listing the viewpoints to be included in the model matrix. By default this list is read from viewpoint_dir.

seq_test

Integer vector identifying which sequences should be sampled from for constructing the model matrix, indexing into the corpus argument of compute_viewpoints. Defaults to the seq_test argument that was provided to compute_viewpoints.

allow_repeats

(Logical scalar) Whether repeated chords are theoretically permitted in the chord sequences. It is recommended to remove such repetitions before modelling.

Details

The following routines should have been run already:

  1. compute_viewpoints

  2. compute_ppm_analyses

Value

The primary output is written to disk in the dir directory. The model matrix provides metafeature values (i.e. expectedness values for discrete features and polynomial values for continuous features) over the entire chord alphabet at every location in seq_test.


pmcharrison/hvr documentation built on April 14, 2020, 2:47 a.m.