In BlasBenito/distantia: Advanced Toolset for Efficient Time Series Dissimilarity Analysis

knitr::opts_chunk$set(
  eval = TRUE,
  collapse = TRUE,
  comment = "#>",
  dpi = 150,
  fig.show = 'hold',
  fig.width = 5
)

options(scipen = 9999)

Summary

The R package distantia 2.0 introduces Time Series Lists as obect to organize time series for dissimilarity analyses, and provides a complete toolset to manage them.

This article describes Time Series Lists in detail, and showcases the most common data handling procedures enabled by the new functions included in the package.

Introduction to Time Series Lists

In this new version of distantia, groups of time series are organized as named lists of zoo objects. These lists are named Time Series Lists (TSL) within the package, and are designed to facilitate the parallelization of dissimilarity analyses.

TSL is not a class by choice, as the idea is keeping them as simple as possible to grant users the autonomy to create or modify them as needed.

Zoo Time Series

The R package zoo provides an S3 class of the same name designed to handle observations ordered by an index. It supports various index classes, such as Date, POSIXct, or even custom numeric or character indices, and handles regular and irregular time series equally well.

Other advantages of using zoo objects include a seamless intergration with base R methods, and built-in tools for alignment, merging, and subsetting.

Let's take a look at a little zoo object.

library(distantia)

z <- distantia::zoo_simulate(
  name = "my_zoo",
  cols = 3,
  rows = 10,
  time_range = c(
    "2024-01-01", 
    "2024-12-31"
    ),
  na_fraction = 0.1,
  irregular = TRUE,
  seed = 1
)

zoo_plot(x = z)

Zoo objects have two main components, a data matrix with the time series observations, and an index representing time or sample order.

The data matrix is extracted with zoo::coredata().

zoo::coredata(z)
class(zoo::coredata(z))

The core data of a zoo object can also be a vector when the time series is univariate.

x <- zoo::zoo(x = runif(10))
is.vector(zoo::coredata(x))

However, this is frowned upon in distantia, and these vectors should always be converted to matrices.

x <- distantia::zoo_vector_to_matrix(x = x)
is.vector(zoo::coredata(x))
is.matrix(zoo::coredata(x))

The index of zoo time series is extracted with zoo::index().

zoo::index(z)
class(zoo::index(z))

The classes for zoo indices explicitly supported in distantia are Date, POSIXct, and numeric. The function distantia::zoo_time() helps summarize the time features of a zoo object, including the time class.

distantia::zoo_time(x = z)

Additionally, in distantia all zoo objects are expected to have the attribute name.

attributes(z)$name

This attribute, not part of the zoo class, is used to facilitate plotting operations, and it is managed internally by tsl_...() functions. There are several functions in distantia to manage the names of zoo objects.

#reset zoo name
z <- distantia::zoo_name_set(
  x = z,
  name = "My_Zoo"
)

#get zoo name
distantia::zoo_name_get(x = z)

#clean zoo name
z <- distantia::zoo_name_clean(
  x = z,
  lowercase = TRUE
)

distantia::zoo_name_get(x = z)

This package comes with several functions to manipulate zoo objects:

zoo_name_set(), zoo_name_get() and zoo_name_clean(): handle the attribute "name".
zoo_time(): details of the zoo index.
zoo_aggregate(): time aggregation of zoo objects.
zoo_resample(): interpolation or extrapolation to a different time, or from irregular to regular time.
zoo_smooth(): rolling-window smoothing.
zoo_permute(): restricted permutation.
zoo_vector_to_matrix() and zoo_to_tsl(): internal functions to facilitate handling zoo objects within TSLs.

rm(x, z)

Time Series Lists (TSL)

TSLs are named lists of zoo time series. The example below shows how to build a TSL from scratch with zoo objects. But this is not the most common or comfortable case, so please, visit the section Creating Time Series Lists to find out how to convert your data easily to TSL.

#create simple tsl
my_tsl <- list(
  A = distantia::zoo_simulate(
    cols = 4,
    na_fraction = 0.2
  ),
  B = distantia::zoo_simulate()
)

names(my_tsl)

class(my_tsl)

#names of the zoo objects
lapply(X = my_tsl, FUN = distantia::zoo_name_get)

#class of the objects within the list
lapply(X = my_tsl, FUN = class)

TSLs ready for dissimilarity analyses must follow several rules to ensure that dissimilarity analyses run without issues:

The elements of the list or the zoo objects must be named.
Time series names cannot be duplicated.
The coredata of all zoo objects must be of class matrix.
There cannot be NA, Inf, or NaN in the zoo objects.
All zoo columns must be named.
All zoo columns must be numeric.
All zoo objects within the TSL must share at least one column name.
The index of the zoo objects must be of the same class.

I understand these are way too many rules, but the functions tsl_diagnose() and tsl_repair() are there to help you forget about them. When applying tsl_diagnose() to my_tsl we can see it has several issues.

distantia::tsl_diagnose(tsl = my_tsl)

From there, we can either follow the suggestions, or apply tsl_repair() directly, as done below.

my_tsl <- distantia::tsl_repair(tsl = my_tsl)

This function identifies the issues raised up by tsl_diagnose() and repairs them when possible. If tsl_diagnose() is run again, it should stay silent if everything is ok.

distantia::tsl_diagnose(
  tsl = my_tsl
)

From this point, our TSL is ready to go!

distantia::tsl_plot(
  tsl = my_tsl,
  guide = FALSE
)

rm(my_tsl)

Creating TSLs

The function tsl_initialize() (with the alias tsl_init()) is designed to help transform several data structures to Time Series List.

Long Data Frame to TSL

Long and tidy data frames are convenient structures to store multivariate time series of a reasonable size. For example, the data frame fagus_dynamics shown below has the column "name" identifying separate time series, the column "time" with observation dates, and three numeric columns with environmental observations.

head(fagus_dynamics)

Transforming this data frame to TSL is quite straightforward:

tsl <- distantia::tsl_initialize(
  x = fagus_dynamics,
  name_column = "name",
  time_column = "time"
)

#even shorter!
tsl <- distantia::tsl_init(
  x = fagus_dynamics,
  name = "name",
  time = "time"
)

distantia::tsl_plot(
  tsl  = tsl
)

Once manipulated and/or analyzed, a TSL can be converted back to data frame with tsl_to_df().

df <- distantia::tsl_to_df(tsl = tsl)
head(df)

rm(df, tsl)

Wide Data Frame to TSL

evi_wide <- stats::reshape(
  data = distantia::fagus_dynamics[, c(
    "name",
    "time",
    "evi"
  )],
  timevar = "name",
  idvar = "time",
  direction = "wide",
  sep = "_"
)

A wide data frame is a useful structure to store univariate time series observed in different places at the same times.

head(evi_wide)

When no name_column is provided, tsl_initialize() assumes that each column is a univariate time series.

tsl <- distantia::tsl_initialize(
  x = evi_wide,
  time_column = "time"
  )

tsl_plot(
  tsl = tsl,
  guide = FALSE
)

In this case, the column names of the univariate zoo objects will "x".

distantia::tsl_colnames_get(tsl = tsl)

This name can be reset as needed with tsl_colnames_set().

tsl <- distantia::tsl_colnames_set(
  tsl = tsl,
  names = "evi"
)

distantia::tsl_colnames_get(tsl = tsl)

This TSL can be converted to data frame, but this time the result is a long data frame.

df <- distantia::tsl_to_df(tsl = tsl)
head(df)

rm(df, evi_wide, tsl)

Vectors and Matrices to TSL

A list of numeric vectors can also be converted to TSL. In this case, the zoo index is a sequence of integers.

tsl <- distantia::tsl_initialize(
  x = list(
    a = runif(10),
    b = runif(10)
  )
)

distantia::tsl_plot(
  tsl = tsl,
  guide = FALSE
)

The same thing can be done with matrices as well.

tsl <- distantia::tsl_initialize(
  x = list(
    a = matrix(data = runif(100), ncol = 2, nrow = 50),
    b = matrix(data = runif(100), ncol = 2, nrow = 50)
  )
)

distantia::tsl_plot(
  tsl = tsl,
  guide = FALSE
)

rm(tsl)

The TSL Toolset

The functions in distantia to handle TSLs are named after the pattern tsl_...(). Some of these functions are designed to explore and better understand your TSLs, while others are designed to manipulate and transform them to facilitate dissimilarity analyses.

This section offers a complete overview of these functions and their applications.

Parallelization Setup

The functions to manage TSLs are a bunch of lapply in a trench coat. Most use future.apply::future_lapply(), a parallelized version of lapply from the future.apply package, and a few combine foreach::foreach() with doFuture::%iterator% in shameless parallelized loops.

As such, they all support a parallelization backend provided by the future package, as shown below. However, take in mind that parallelization is only worth it for very large datasets.

library(future)
library(parallelly)

future::plan(
  future::multisession,
  workers = parallelly::availableCores() - 1
  )

They also support progress bars via the progressr package. However, this option, commented in the code below, does not work in Rmarkdown.

#progress bar (does not work in Rmarkdown)
#progressr::handlers(global = TRUE)

Explore TSLs

This section showcases all tools available in distantia to better understand our Time Series Lists. Logically, the section starts with distantia::tsl_simulate(), a function that has nothing to do with such purpose.

tsl <- distantia::tsl_simulate(
  n = 4,
  cols = 5,
  rows = 1000,
  time_range = c("2010-01-01", "2020-01-01"),
  data_range = c(0, 1),
  seasons = 10,
  na_fraction = 0.1,
  irregular = TRUE
)

tsl_plot(
  tsl = tsl,
  guide = FALSE
  )

The simpler tools to explore TSLs are focused on simpler things, like extracting TSL names and dimensions.

#time series names
distantia::tsl_names_get(tsl = tsl)

#column names
distantia::tsl_colnames_get(tsl = tsl)

#number of columns
distantia::tsl_ncol(tsl = tsl)

#number of rows
distantia::tsl_nrow(tsl = tsl)

The function tsl_time() summarizes the time features of all time series in a TSL.

distantia::tsl_time(tsl = tsl)

The function tsl_stats(), on the other hand, computes stats by time series and variable to summarize the TSL values.

df_stats <- distantia::tsl_stats(
  tsl = tsl, 
  lags = 1 #temporal autocorrelation lag
  )

df_stats

Handle NA Cases

The dissimilarity analyses implemented in version 2.0 of distantia do not support NA cases in time series lists. There are two alternate workflows to handle NA cases in time series list.

The first uses tsl_count_NA() and tsl_handle_NA(). The former function converts Inf and NaN to NA and counts NA cases in each time series, while the latter either omits or imputes NA cases via zoo::na.approx().

#count NA
distantia::tsl_count_NA(tsl = tsl) 

#impute NA cases
tsl_notNA <- distantia::tsl_handle_NA(
  tsl = tsl,
  na_action = "impute"
)

#re-count
distantia::tsl_count_NA(tsl = tsl_notNA)

The second workflow involves the functions tsl_diagnose() and tsl_repair(), is more general because it addresses other potential issues at once.

#diagnose issues with NA values
distantia::tsl_diagnose(tsl = tsl)

#impute NA cases
tsl <- tsl_repair(tsl = tsl)

#re-diagnose to check result
distantia::tsl_diagnose(tsl = tsl)

Subsetting

The function tsl_subset() helps focus on particular regions of a time series list. Additionally, by default this function returns the numeric columns that are shared across all time series in a TSL.

tsl_new <- distantia::tsl_subset(
  tsl = tsl,
  names = c("A", "C", "D"),
  colnames = c("a", "b"),
  time = c("2014-01-01", "2018-01-01"),
  numeric_cols = TRUE,
  shared_cols = TRUE
)

distantia::tsl_plot(
  tsl = tsl_new
  )

The function tsl_burst() transforms a multivariate TSL into a univariate TSL by creating a new zoo object from each column of the original zoo objects. This function helps apply dissimilarity analysis between individual variables in multivariate time series.

#burst multivariate time series to univariate
tsl_univariate <- distantia::tsl_burst(
  tsl = tsl_new
)

#check new time series names
distantia::tsl_names_get(tsl = tsl_univariate)

#check new column names
distantia::tsl_colnames_get(tsl = tsl_univariate)

#plot univariate time series
distantia::tsl_plot(
  tsl = tsl_univariate,
  guide = FALSE
  )

rm(tsl, tsl_new, tsl_notNA, tsl_univariate, df_stats)

Aggregation

Aggregation reduces time series frequency (a.k.a downsampling) by summarizing multiple data points into a single value over a specified time interval. It results in a reduction in the number of samples, smooths noise out, can transform irregular time series into regular, and can generate entirely new time series depending on the aggregation stats. On the other hand, it obscures fine grain detail and alters statistical properties such as variance and temporal autocorrelation.

In distantia, this operation is supported by the function tsl_aggregate(), with two arguments:

new_time: time vector or keyword defining time intervals to aggregate over.
f: aggregation function summarizing observations over aggregation time intervals.

The code below illustrates this function's usage to compute yearly temperature and precipitation indicators from the monthly observations in the dataset honeycomb_climate.

tsl <- distantia::tsl_init(
  x = distantia::honeycomb_climate,
  name = "cell",
  time = "time"
) |> 
  tsl_subset(
    names = 1:5 #subset first five elements
  )

distantia::tsl_plot(
  tsl = tsl
)

The easiest way to define aggregation intervals in distantia is to use a keyword. The function tsl_time() returns the supported keywords for a given TSL.

df_time <- distantia::tsl_time(
  tsl = tsl,
  keywords = "aggregate"
  )

df_time$keywords |> 
  unlist() |> 
  unique()

Let's focus on "quarters" for the rest of the example.

interval <- "quarters"

The code below subsets the column "temperature" in tsl, aggregates by computing the minimum, maximum, and mean per quarter, and adds a suffix to each new aggregated column.

#subset temperature column
tsl_temperature <- distantia::tsl_subset(
  tsl = tsl,
  colnames = "temperature"
)

#compute stats: minimum, maximum, and mean
tsl_temperature_min <- distantia::tsl_aggregate(
  tsl = tsl_temperature,
  new_time = interval,
  f = min
) |> 
  distantia::tsl_colnames_suffix(
    suffix = "_min" #set suffix for aggregated column
  )

tsl_temperature_max <- distantia::tsl_aggregate(
  tsl = tsl_temperature,
  new_time = interval,
  f = max
) |> 
  distantia::tsl_colnames_suffix(
    suffix = "_max"
  )

tsl_temperature_mean <- distantia::tsl_aggregate(
  tsl = tsl_temperature,
  new_time = interval,
  f = mean
) |> 
  distantia::tsl_colnames_suffix(
    suffix = "_mean"
  )

Similar steps can be followed to process the variable "precipitation".

#subset temperature column
tsl_precipitation <- distantia::tsl_subset(
  tsl = tsl,
  colnames = "precipitation"
)

#compute stats: minimum, maximum, and mean
tsl_precipitation_sum <- distantia::tsl_aggregate(
  tsl = tsl_precipitation,
  new_time = interval,
  f = sum
) |> 
  distantia::tsl_colnames_suffix(
    suffix = "_sum" #set suffix for aggregated column
  )

tsl_precipitation_max <- distantia::tsl_aggregate(
  tsl = tsl_precipitation,
  new_time = interval,
  f = max
) |> 
  distantia::tsl_colnames_suffix(
    suffix = "_max"
  )

tsl_precipitation_min <- distantia::tsl_aggregate(
  tsl = tsl_precipitation,
  new_time = interval,
  f = min
) |> 
  distantia::tsl_colnames_suffix(
    suffix = "_min"
  )

Finally, we join all new TSLs together to examine the result.

tsl_climate_stats <- distantia::tsl_join(
  tsl_temperature_min,
  tsl_temperature_max,
  tsl_temperature_mean,
  tsl_precipitation_sum,
  tsl_precipitation_max,
  tsl_precipitation_min
)

distantia::tsl_plot(
  tsl = tsl_climate_stats,
  ylim = "relative"
)

rm(
  tsl_temperature_min,
  tsl_temperature_max,
  tsl_temperature_mean,
  tsl_precipitation_sum,
  tsl_precipitation_max,
  tsl_precipitation_min,
  tsl_climate_stats,
  tsl_precipitation,
  tsl_temperature,
  interval
)

Resampling

Resampling is a model-based method to change the frequency of a time series via interpolation. It is useful to align time series that are irregular or have different resolutions. It is important to take in mind that resampling to a frequency much higher than the original will definitely result in interpolation artifacts and distorted time series.

The code below creates two irregular time series with different number of rows.

tsl <- distantia::tsl_init(
  x = list(
    A = zoo_simulate(rows = 100, seasons = 10),
    B = zoo_simulate(rows = 50, seasons = 10)
  )
)

distantia::tsl_plot(
  tsl = tsl, 
  guide = FALSE
  )

The time features of this TSL show clear differences in frequency between these time series.

distantia::tsl_time(tsl = tsl)[
  , 
  c("name", "rows", "resolution", "begin", "end")
  ]

By default, if the argument new_time is omitted, a linear model (computed via zoo:::na.approx()) is used to resample the TSL to its average resolution over the intersection of all its time ranges.

tsl_resampled <- distantia::tsl_resample(
  tsl = tsl
)

distantia::tsl_time(tsl = tsl_resampled)[
  , 
  c("name", "rows", "resolution", "begin", "end")
  ]

distantia::tsl_plot(
  tsl = tsl_resampled, 
  guide = FALSE
  )

Otherwise, a keyword can be used to define the resampling frequency. Again, we can find valid keywords using tsl_time(), but replacing "aggregate" with "resample" in the argument keywords.

df <- distantia::tsl_time(
  tsl = tsl,
  keywords = "resample"
  )

df$keywords |> 
  unlist() |> 
  unique()

tsl_resampled <- distantia::tsl_resample(
  tsl = tsl,
  new_time = "weeks",
  method = "loess"
)

tsl_time(tsl = tsl_resampled)[
  , 
  c("name", "rows", "resolution", "begin", "end")
  ]

distantia::tsl_plot(
  tsl = tsl_resampled, 
  guide = FALSE
  )

rm(df, df_time, tsl, tsl_resampled)

Smoothing

Time series smoothing is used to mitigate noise in high-frequency time series, or to highlight general trends rather than fine-grain details.

The function tsl_smooth() implements two smoothing methods: rolling-window and exponential.

To check how these methods work we first need a long and noisy TSL.

tsl <- distantia::tsl_simulate(
  rows = 1000,
  irregular = FALSE
)

distantia::tsl_plot(
  tsl = tsl,
  guide = FALSE
  )

Rolling-Window Smoothing

This method computes a statistic over a fixed-width window of consecutive cases and replaces each central value with the computed statistic. It should not be applied to highly irregular time series, as it ignores time distance.

The code below applies this method with two different window sizes.

#smoothing with window 100
tsl_smooth_100 <- tsl_smooth(
  tsl = tsl,
  window = 100,
  f = mean
)

distantia::tsl_plot(
  tsl = tsl_smooth_100,
  guide = FALSE
  )

#smoothing with window 10
tsl_smooth_10 <- tsl_smooth(
  tsl = tsl,
  window = 10,
  f = mean
)

distantia::tsl_plot(
  tsl = tsl_smooth_10,
  guide = FALSE
  )

The window size determines the smoothing scale, and with that, the type of trend highlighted in the smoothing results.

Exponential Smoothing

This method generates each new value as the weighted average of the current value and past smoothed values. This weight is defined by the argument alpha in tsl_smooth().

#smoothing with alpha 0.2
tsl_smooth_exp_0.2 <- tsl_smooth(
  tsl = tsl,
  alpha = 0.2
)

distantia::tsl_plot(
  tsl = tsl_smooth_exp_0.2,
  guide = FALSE
  )

#smoothing with alpha 0.8
tsl_smooth_exp_0.8 <- tsl_smooth(
  tsl = tsl,
  alpha = 0.8
)

distantia::tsl_plot(
  tsl = tsl_smooth_exp_0.8,
  guide = FALSE
  )

Alpha values closer to zero produce smoother results, as the plots above show.

rm(
  tsl, 
  tsl_smooth_10,
  tsl_smooth_100,
  tsl_smooth_exp_0.2,
  tsl_smooth_exp_0.8
)

Transformations

The function tsl_transform() applies a function f to transform the values of a TSL. The names of the available f functions can be listed with f_list().

distantia::f_list()

Scale Adjustments

Scaling and centering multivariate time series is essential in dynamic time warping to ensure all variables contribute equally, regardless of their range or units.

For example, the dataset fagus_dynamics has variables in different units.

tsl <- distantia::tsl_init(
  x = distantia::fagus_dynamics,
  name = "name",
  time = "time"
)

distantia::tsl_plot(
  tsl = tsl
)

Due to the differences in magnitude between variables, a dynamic time warping analysis will focus on rainfall disproportionately, biasing the results.

To solve this issue, the package distantia implements two flavors of scaling and/or centering:

Local: each variable is scaled and/or centered independently by time series.
Global: each variable is scaled and/or centering using its mean and standard deviation across all time series.

#local scaling
tsl_local_scaling <- distantia::tsl_transform(
  tsl = tsl,
  f = distantia::f_scale_local
)

#global scaling
tsl_global_scaling <- distantia::tsl_transform(
  tsl = tsl,
  f = distantia::f_scale_global
)

The stats of both operations show that the global one preserves variable offsets between locations, while the local one shows an average 0 and standard deviation 1 across all variables.

stats_cols <- c("name", "variable", "mean", "sd")

#stats of local scaling
distantia::tsl_stats(
  tsl = tsl_local_scaling
)[, stats_cols]

#stats of global scaling
distantia::tsl_stats(
  tsl = tsl_global_scaling
)[, stats_cols]

The functions f_rescale_local and f_rescale_global work under the same principle to rescale variable ranges. By default, these functions rescale all time series between 0 and 1, but the arguments new_min and new_max can receive numeric vectors to set different ranges for different variables.

tsl_local_rescaling <- tsl_transform(
  tsl = tsl,
  f = f_rescale_local,
  new_min = 0, #same for all variables
  new_max = c(0, 100, 10)
)

tsl_global_rescaling <- tsl_transform(
  tsl = tsl,
  f = f_rescale_global,
  new_min = 0,
  new_max = c(0, 100, 10)
)

The stats below show the results of the local and global rescaling.

stats_cols <- c("name", "variable", "min", "max")

distantia::tsl_stats(
  tsl = tsl_local_rescaling
)[, stats_cols]

distantia::tsl_stats(
  tsl = tsl_global_rescaling
)[, stats_cols]

rm(
  tsl,
  tsl_global_scaling,
  tsl_local_scaling,
  tsl_global_rescaling,
  tsl_local_rescaling,
  stats_cols
)

Trend Adjustments

Removing trends in time series before applying dynamic time warping may help prevents inflated dissimilarity scores caused by non-stationary components. Detrending also ensures that the alignment generated by dynamic time warping focuses on meaningful shape features rather than being distorted by long-term trends.

The transformation function f_trend_linear transforms time series into their linear trend to help identify whether a time series needs detrending or not.

The example below applies this function to the cities_temperature dataset.

#loading cities_temperature as tsl
tsl <- distantia::tsl_init(
  x = distantia::cities_temperature,
  name = "name",
  time = "time"
)

#computing linear trends
tsl_trend <- distantia::tsl_transform(
  tsl = tsl,
  f = distantia::f_trend_linear
)

#plotting linear trends
tsl_plot(
  tsl = tsl_trend,
  columns = 2,
  guide = FALSE,
  line_color = "red4",
  line_width = 1.5
)

We can now compute the stats of these linear trends to identify the cities with a steeper long-term temperature change.

#compute stats of linear trends
df_stats <- distantia::tsl_stats(
  tsl = tsl_trend,
  lags = 0
)

#arrange from higher to lower range
df_stats[
  order(df_stats$range, decreasing = TRUE), 
  c("name", "range")
  ]

Now that we identified the issue, the function f_detrend_linear helps remove these temporal trends from the data.

tsl_detrended <- distantia::tsl_transform(
  tsl = tsl,
  f = distantia::f_detrend_linear
)

The code below applies f_trend_linear again to demonstrate that the linear detrending worked as expected.

tsl_trend <- distantia::tsl_transform(
  tsl = tsl_detrended,
  f = distantia::f_trend_linear
)

df_stats <- distantia::tsl_stats(
  tsl = tsl_trend,
  lags = 0
)

df_stats[, c("name", "range")]

rm(
  tsl,
  tsl_trend,
  tsl_detrended,
  df_stats
)

Compositional Normalization

The distantia package offers several transformations for compositional data:

f_percent: percentages.
f_proportion: proportions.
f_proportion_sqrt: square root of proportions
f_hellinger: transforms to proportion and applies Hellinger transformation.
f_clr: centered log-ratio.
f_log: logarithm transformation.
f_binary: transform continuous data into discrete events.

The example below shows the application of several of these transformation on the counts dataset eemian_pollen.

tsl_counts <- distantia::tsl_init(
  x = distantia::eemian_pollen,
  name = "name",
  time = "time"
) |> 
  distantia::tsl_subset(
    names = c(1, 2)
  )

distantia::tsl_plot(
  tsl = tsl_counts,
  guide_columns = 4
  )

Transformation to percentages with f_percent.

tsl_percent <- distantia::tsl_transform(
  tsl = tsl_counts,
  f = distantia::f_percent
)

distantia::tsl_plot(
  tsl = tsl_percent,
  guide_columns = 4
  )

The function f_binary helps transform continuous data into discrete events. The code below converts the percentages above into presence/absence using 5% as threshold. Notice only the variable "Carpinus" is shown in the plot to facilitate visualization.

tsl_binary <- distantia::tsl_transform(
  tsl = tsl_percent,
  f = distantia::f_binary,
  threshold = 5
)

distantia::tsl_plot(
  tsl = distantia::tsl_subset(
    tsl = tsl_binary,
    colnames = "Carpinus"
    )
  )

Transformation to square root of proportions with f_proportion_sqrt.

tsl_prop_sqrt <- distantia::tsl_transform(
  tsl = tsl_counts,
  f = distantia::f_proportion_sqrt
)

distantia::tsl_plot(
  tsl = tsl_prop_sqrt,
  guide_columns = 4
  )

The function f_hellinger transforms the taxa counts to proportions and the applies the Hellinger transformation.

tsl_hellinger <- distantia::tsl_transform(
  tsl = tsl_counts,
  f = distantia::f_hellinger
)

tsl_plot(
  tsl = tsl_hellinger, 
  guide_columns = 4
  )

Centered log-ration computed with f_clr.

tsl_clr <- distantia::tsl_transform(
  tsl = tsl_counts,
  f = distantia::f_clr
)

tsl_plot(
  tsl = tsl_clr, 
  guide_columns = 4
  )

BlasBenito/distantia documentation built on Feb. 21, 2025, 2:48 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BlasBenito/distantia
Advanced Toolset for Efficient Time Series Dissimilarity Analysis

In BlasBenito/distantia: Advanced Toolset for Efficient Time Series Dissimilarity Analysis

Summary

Introduction to Time Series Lists

Zoo Time Series

Time Series Lists (TSL)

Creating TSLs

Long Data Frame to TSL

Wide Data Frame to TSL

Vectors and Matrices to TSL

The TSL Toolset

Parallelization Setup

Explore TSLs

Handle NA Cases

Subsetting

Aggregation

Resampling

Smoothing

Rolling-Window Smoothing

Exponential Smoothing

Transformations

Scale Adjustments

Trend Adjustments

Compositional Normalization

R Package Documentation

Browse R Packages

We want your feedback!

BlasBenito/distantia Advanced Toolset for Efficient Time Series Dissimilarity Analysis

In BlasBenito/distantia: Advanced Toolset for Efficient Time Series Dissimilarity Analysis

Summary

Introduction to Time Series Lists

Zoo Time Series

Time Series Lists (TSL)

Creating TSLs

Long Data Frame to TSL

Wide Data Frame to TSL

Vectors and Matrices to TSL

The TSL Toolset

Parallelization Setup

Explore TSLs

Handle NA Cases

Subsetting

Aggregation

Resampling

Smoothing

Rolling-Window Smoothing

Exponential Smoothing

Transformations

Scale Adjustments

Trend Adjustments

Compositional Normalization

R Package Documentation

Browse R Packages

We want your feedback!

BlasBenito/distantia
Advanced Toolset for Efficient Time Series Dissimilarity Analysis