knitr::opts_chunk$set( collapse = TRUE, comment = "##", fig.width = 6, fig.height = 4, dpi = 72, fig.retina = 1, out.width = "90%" ) library("tidyverse") library("viridisLite") theme_set(theme_minimal() + theme(legend.position = "bottom")) options( ggplot2.continuous.colour = "viridis", ggplot2.continuous.fill = "viridis" ) scale_colour_discrete <- scale_colour_viridis_d scale_fill_discrete <- scale_fill_viridis_d library("tidyfun") pal_5 <- viridis(7)[-(1:2)] set.seed(1221)
This vignette introduces the tf class, as well as the tfd and tfb subclasses, and focuses on vectors of this class. It also illustrates operations for tf vectors.
tf-Class: Definitiontf-classtf is a new data type for (vectors of) functional data:
an abstract superclass for functional data in 2 forms:
tfd, also irregular or sparsetfb represents each observed function as a weighted sum
of a fixed dictionary of known "basis functions".basically, a list of numeric vectors
(... since lists work well as columns of data frames ...)
with additional attributes that define function-like behavior:
S3 based
First we extract a tf vector from the tidyfun::dti_df dataset containing fractional anisotropy tract profiles for the corpus callosum (cca). When printed, tf vectors show the first few arg and value pairs for each subject.
data("dti_df") cca <- dti_df$cca cca
We also extract a simple 5-element vector of functions on a regular grid:
cca_five <- cca[1:5, seq(0, 1, length.out = 93), interpolate = TRUE] rownames(cca_five) <- LETTERS[1:5] cca_five <- tfd(cca_five, signif = 2) cca_five
For illustration, we plot the vector cca_five below.
plot(cca_five, xlim = c(-0.15, 1), col = pal_5) text(x = -0.1, y = cca_five[, 0.07], labels = names(cca_five), col = pal_5)
tf subclass: tfdtfd objects contain "raw" functional data:
evaluations $f_i(t)|_{t=t'}$ and corresponding argument vector(s) $t'$domain: the range of valid args.cca_five |> tf_evaluations() |> str() cca_five |> tf_arg() |> str() cca_five |> tf_domain()
tfd-vector contains an evaluator function that defines how to inter-/extrapolate evaluations between argstf_evaluator(cca_five) |> str() tf_evaluator(cca_five) <- tf_approx_spline
tfd has two subclasses: one for regular data with a common grid and one for irregular or sparse data. The cca data are irregular (values are missing for some subjects at some arguments) but the example below more clearly illustrates support for sparse and irregular data using CD4 cell counts from a longitudinal study included in refund.cd4_vec <- tfd(refund::cd4) cd4_vec[1:2] cd4_vec[1:2] |> tf_arg() |> str() cd4_vec[1:20] |> plot(pch = "x", col = viridis(20))
tf subclass: tfbFunctional data in basis representation:
coefficients and a common basis_matrix of basis function evaluations on a vector of arg-values.basis function that defines how to evaluate the basis functions for new args and how to differentiate or integrate it.tfb_spline: uses mgcv-spline bases tfb_fpc: uses functional principal components refund::DTI$cca |> object.size() |> print(units = "Kb") cca |> object.size() |> print(units = "Kb") cca |> tfb(verbose = FALSE) |> object.size() |> print(units = "Kb")
tfb_spline: spline basistfb()mgcv's s()-syntax: basis type bs, basis dimension k, penalty order m, etc...family argument cca_five_b <- cca_five |> tfb() cca_five_b[1:2] cca_five[1:2] |> tfb(bs = "tp", k = 55) # functions represent ratios in (0,1), so a Beta-distribution is more appropriate: cca_five[1:2] |> tfb(bs = "ps", m = c(2, 1), family = mgcv::betar(link = "cloglog"))
Function-specific (default), none, prespecified (sp), or global:
layout(t(1:2)) cca_five |> plot() cca_five_b |> plot(col = "red") cca_five |> tfb(k = 35, penalized = FALSE) |> lines(col = "blue") cca_five |> tfb(sp = 0.001) |> lines(col = "orange")
Right plot shows smoothing with function-specific penalization in red, without penalization in blue,
and with manually set strong smoothing (sp $\to 0$) in orange.
"Global" smoothing:
Advantages:
Disadvantages
set.seed(1212) raw <- c( tf_rgp(5, scale = 0.2, nugget = 0.05, arg = 101L) - 5, tf_rgp(5, scale = 0.02, nugget = 0.05, arg = 101L), tf_rgp(5, scale = 0.002, nugget = 0.05, arg = 101L) + 5 )
Dataset with heterogeneous roughness:
layout(t(1:3)) clrs <- scales::alpha(sample(viridis(15)), 0.5) plot(raw, main = "raw", col = clrs) plot(tfb(raw, k = 55), main = "separate", col = clrs) plot(tfb(raw, k = 55, global = TRUE), main = "global", col = clrs)
tfb FPC-basedtfd-objectpve cca_five_fpc <- cca_five |> tfb_fpc(pve = 0.999) cca_five_fpc cca_five_fpc_lowrank <- cca_five |> tfb_fpc(pve = 0.6) cca_five_fpc_lowrank
layout(t(1:2)) cca_five |> plot() cca_five_fpc |> plot(col = "red", ylab = "tfb_fpc(cca_five)") cca_five_fpc_lowrank |> lines(col = "blue", lty = 2)
tfb_fpc is currently only implemented for data on identical
(but possibly non-equidistant) grids. The {refunder} rfr_fpca-functions
provide FPCA methods appropriate for highly irregular and sparse data and regularized/smoothed FPCA.
tf-Class: Methodstidyfun implements almost all types of operations that are available for conventional
numerical or logical vectors for tf-vectors as well, so you can:
cca_five[1:2] cca_five[1:2] <- cca_five[2:1] cca_five
n_cca_five <- names(cca_five) cca_five <- unname(cca_five)
cca_five[1] + cca_five[1] == 2 * cca_five[1] log(exp(cca_five[2])) == cca_five[2] (cca_five - (2:-2)) != cca_five
names(cca_five) <- n_cca_five
Compute functional summaries like mean functions, functional standard deviations or variances or functional data depths over a vector of functional data:
c(mean = mean(cca_five), sd = sd(cca_five)) tf_depth(cca_five) ## Modified Band-2 Depth (a la Sun/Genton/Nychka, 2012), others to come. median(cca_five) == cca_five[which.max(tf_depth(cca_five))] summary(cca_five)
Compute summaries for each function like its mean or extreme values, quantiles, etc.
tf_fmean(cca_five) # mean of each function's evaluations tf_fmax(cca_five) # max of each function's evaluations # 25%-tile of each f(t) for t > .5: tf_fwise(cca_five, \(x) quantile(x$value[x$arg > 0.5], prob = 0.25)) |> unlist()
tf_fwise can be used to define custom statistics for each function that can depend on both its value and its arg.
In addition, tidyfun provides methods for operations that are specific for functional data:
tf-objects have a special [-operator: Its second argument specifies
argument values at which to evaluate the functions and has some additional options,
so it's easy to get point values for tf objects, in matrix or data.frame formats:
cca_five[1:2, seq(0, 1, length.out = 3)] cca_five["B", seq(0, 0.15, length.out = 3), interpolate = FALSE] cca_five[1:2, seq(0, 1, length.out = 7), matrix = FALSE] |> str()
layout(t(1:3)) cca_five |> plot(alpha = 0.2, ylab = "lowess") cca_five |> tf_smooth("lowess") |> lines(col = pal_5) cca_five |> plot(alpha = 0.2, ylab = "rolling median (k=5)") cca_five |> tf_smooth("rollmedian", k = 5) |> lines(col = pal_5) cca_five |> plot(alpha = 0.2, ylab = "Savitzky-Golay (quartic, 11 steps)") cca_five |> tf_smooth("savgol", fl = 11) |> lines(col = pal_5)
layout(t(1:3)) cca_five |> plot(col = pal_5) cca_five |> tf_smooth() |> tf_derive() |> plot(col = pal_5, ylab = "tf_derive(tf_smooth(cca_five))") cca_five |> tf_integrate(definite = FALSE) |> plot(col = pal_5)
cca_five |> tf_integrate()
tidyfun makes it easy to find (ranges of) arguments $t$ satisfying a condition on value $f(t)$ (and argument $t$):
cca_five |> tf_anywhere(value > 0.65) cca_five[1:2] |> tf_where(value > 0.6, "all") cca_five[2] |> tf_where(value > 0.6, "range") cca_five |> tf_where(value > 0.6 & arg > 0.5, "first")
cca_five |> plot(xlim = c(-0.15, 1), col = pal_5, lwd = 2) text(x = -0.1, y = cca_five[, 0.07], labels = names(cca_five), col = pal_5, cex = 1.5) median(cca_five) |> lines(col = pal_5[3], lwd = 4)
# where are the first maxima of these functions? cca_five |> tf_where(value == max(value), "first") # where are the first maxima of the later part (t > .5) of these functions? cca_five[c("A", "D")] |> tf_zoom(0.5, 1) |> tf_where(value == max(value), "first") # which f_i(t) are below the functional median anywhere for 0.2 < t < 0.6? # (t() needed here so we're comparing column vectors to column vectors...) cca_five |> tf_zoom(0.2, 0.6) |> tf_anywhere(value <= t(median(cca_five)[, arg]))
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.