workflowPsiHP: A refactored version of 'workflowPsi' with a higher...
In distantia: Assessing Dissimilarity Between Multivariate Time Series

Description Usage Arguments Details Value Author(s) Examples

Ideal for large analyses with hundreds to thousands of sequences. Several options available in workflowPsi have been removed from this function in order to simplify the code as much as possible. Psi is computed with the options diagonal = TRUE, ignore.blocks = TRUE, and method = "euclidean".

workflowPsiHP(
  sequences = NULL,
  grouping.column = NULL,
  time.column = NULL,
  exclude.columns = NULL,
  parallel.execution = TRUE
  )

`sequences`	dataframe with multiple sequences identified by a grouping column generated by `prepareSequences`.
`grouping.column`	character string, name of the column in `sequences` to be used to identify separates sequences within the file.
`time.column`	character string, name of the column with time/depth/rank data.
`exclude.columns`	character string or character vector with column names in `sequences` to be excluded from the analysis.
`parallel.execution`	boolean, if `TRUE` (default), execution is parallelized, and serialized if `FALSE`.

Due to limitations of the function permutations, the maximum number of groups (according to grouping.column) is around 30000. Besides, a combinations table of this size takes, roughlyl, 7GB of memory.

A dataframe with sequence names and psi values.

Blas Benito <blasbenito@gmail.com>

data("sequencesMIS")
#prepare sequences
MIS.sequences <- prepareSequences(
  sequences = sequencesMIS[sequencesMIS$MIS %in% c("MIS-1", "MIS-2"), ],
  grouping.column = "MIS",
  if.empty.cases = "zero",
  transformation = "hellinger"
  )

#execute workflow to compute psi
MIS.psi <- workflowPsiHP(
 sequences = MIS.sequences,
 grouping.column = "MIS",
 parallel.execution = FALSE
 )

MIS.psi