model_dataset: Model dataset
In pmcharrison/hvr: Harmonic Viewpoint Regression

Description Usage Arguments Details Value

Analyses a dataset of chord sequences by constructing and optimising a viewpoint regression model, and using this model to generate predictions for these sequences.

model_dataset(
  corpus_test,
  corpus_pretrain,
  output_dir,
  viewpoints = hvr::hvr_viewpoints,
  weights = NULL,
  poly_degree = 4L,
  max_iter = 500,
  corpus_test_folds = list(seq_along(corpus_test)),
  allow_repeats = FALSE,
  max_sample = Inf,
  sample_seed = 1,
  stm_opt = stm_options(),
  ltm_opt = ltm_options(),
  na_val = 0,
  perm_int = TRUE,
  perm_int_seed = 1,
  perm_int_reps = 5,
  allow_negative_weights = FALSE
)

`corpus_test`	Corpus of chord sequences to predict, as created by `corpus`.
`corpus_pretrain`	Corpus of chord sequences with which to pretrain the model, as created by `corpus`. These chord sequences are used solely to pretrain the discrete viewpoint models; continuous viewpoint effects and discrete viewpoint weights are optimised on `corpus_test`.
`output_dir`	(Character scalar) Directory in which to save the model outputs.
`viewpoints`	List of viewpoints to apply, as created by `new_viewpoint`. Defaults to a fairly comprehensive list, `hvr_viewpoints`.
`weights`	(NULL or numeric vector) An optional set of viewpoint regression weights; if not provided, weights will be optimised automatically. These weights should be provided as a named numeric vector in a specific order; the best way to find this format is to fit a pilot regression model with the desired viewpoint set.
`poly_degree`	(Integer scalar) Degree of the polynomials to compute for the continuous features.
`max_iter`	(Integer scalar) Maximum number of iterations for the optimisation routine.
`corpus_test_folds`	List of cross-validation folds for applying discrete viewpoint models to the sequences in `corpus_test`. Each list element should be an integer vector indexing into `corpus_test`. These integer vectors must exhaustively partition the sequences in `corpus_test`. The algorithm iterates over each fold, predicting the sequences within that fold, and training the model using the combination of a) the sequences from the other folds in `corpus_test_folds` and b) the sequences in `corpus_pretrain`. By default, there is just one fold corresponding to the entire of `corpus_test`, meaning that no cross-validation is applied.
`allow_repeats`	(Logical scalar) Whether repeated chords are theoretically permitted in the chord sequences. It is recommended to remove such repetitions before modelling.
`max_sample`	(Numeric scalar) Maximum number of events to sample for the model matrix, defaults to `Inf` (no downsampling). Lower values of `max_sample` prompt random downsampling.
`sample_seed`	(Integer scalar) Random seed to make the downsampling reproducible.
`stm_opt`	Options list for the short-term PPM models, as created by the function `stm_options`.
`ltm_opt`	Options list for the long-term PPM models, as created by the function `ltm_options`.
`na_val`	(Numeric scalar) Value to use to code for NA in the model matrix. The statistical analyses are mostly unaffected by this value.
`perm_int`	(Logical scalar) Whether to compute permutation-based feature importances.
`perm_int_seed`	(Integer scalar) Random seed for the permutation-based feature importances.
`perm_int_reps`	(Integer scalar) Number of replicates for the permutation-based feature importances (the final estimates are averages over these replicates).
`allow_negative_weights`	(Logical scalar) Whether negative weights should be allowed for discrete features (`FALSE` by default).

This function wraps the following sub-routines:

compute_viewpoints
compute_ppm_analyses
compute_model_matrix
viewpoint_regression
compute_predictions

Users may wish to use these sub-routines explicitly if performing repeated analyses with different parameter settings, to save redundant computation.

Various model outputs are saved to output_dir. The function returns a tibble of predicted probabilities for the chords in corpus_test; see compute_predictions for an explanation of this tibble.

pmcharrison/hvr documentation built on April 14, 2020, 2:47 a.m.