ts_dist_part_file: Calculate distances between pairs of time series stored in...

View source: R/ts_dist.R

ts_dist_part_fileR Documentation

Calculate distances between pairs of time series stored in files.

Description

This function works similarly as dist_parts_parallel(). The difference is that it reads the time series from RDS files in a directory. The advantage of this approach is that it does not load all the time series in memory but reads them only when necessary. This means that this function requires much less memory and should be preferred when memory consumption is a concern, e.g., huge data set or very long time series. The disadvantage of this approach is that it requires a high number of file read operations which considerably takes more time during the calculations. IMPORTANT: the file order is very important so it is highly recommended to use numeric names, e.g., 0013.RDS.

Usage

ts_dist_part_file(
  input_dir,
  num_part,
  num_total_parts,
  combinations,
  measureFunc = tsdist_cor,
  isSymetric = TRUE,
  error_value = NaN,
  warn_error = TRUE,
  simplify = TRUE,
  num_cores = 1,
  ...
)

Arguments

input_dir

Directory path for the directory with time series files (RDS)

num_part

Numeric positive between 1 and the total number of parts (num_total_parts). This value corresponds to the part (chunk) of the total number of parts to be calculated.

num_total_parts

Numeric positive corresponding the total number of parts.

combinations

A list composed by arrays of size 2 indicating the files indices to be compared. If this parameter is passed, then the function does not split all the possibilities and does not use the parameters num_part and num_total_parts.

measureFunc

Function to be applied to all combinations of time series. This function should have at least two parameters for each time series. Ex: function(ts1, ts2) cor(ts1, ts2)

isSymetric

Boolean. If the distance function is symmetric.

error_value

The value returned if an error occur when calculating a the distance for a pair of time series.

warn_error

Boolean. If TRUE (default), a warning will rise when an error occur during the calculations.

simplify

Boolean. If FALSE, returns a list of one ( if isSymetric == FALSE) or two elements (if isSymetric == TRUE).

num_cores

Numeric. Number of cores

...

Additional parameters for measureFunc

Value

A data frame with elements (i,j) and a distance value calculated for the time series i and j. Each index corresponds to the order where the files are listed.


ts2net documentation built on June 9, 2022, 9:06 a.m.