README.md

Overview of the fdahotelling package

The package fdahotelling is an R package about inference for functional data. In this setting, statistical units are curves that belong to infinite dimensional spaces such as (L^2). The package handles input functional data stored either in matrix objects or in fd objects from the fda package. The package provides a set of statistics for testing equality in distribution between samples of curves using both exact permutation testing procedures and asympotic ones. The implementation is largely in C++ language using the Armadillo C++ library, which alleviates the computational burden often associated with permutation tests. In details:

See the vignette Available statistics for functional data for the details of each function.

Installation

You can install fdahotelling from github with:

# install.packages("devtools")
devtools::install_github("astamm/fdahotelling")

Usage

library(fda)
#> Loading required package: splines
#> Loading required package: Matrix
#> 
#> Attaching package: 'fda'
#> The following object is masked from 'package:graphics':
#> 
#>     matplot
library(fdahotelling)

Example 1: Using data stored in a matrix object

There is a dataset called aneurisk included in the fdahotelling package that contains reconstructed curvature, torsion, radius and wall-sheer stress of the termination of the internal carotid artery (ICA) of two groups (low and up) of patients. The low group includes healthy subjects as well as patients with an aneurysm outside of the brain (before entering the circle of Willis). The up group includes patients with an aneurysm within the brain. The latter are thought to be clinically more at risk, since rupture of their aneursym(s) would have devastating consequences on their neurological abilities. One of the research questions is to understand whether the geometry of the ICA might play a role in identifying patients that are more at risk. We can thus use for instance Hotelling’s statistic together with a permutation testing procedure to test if the radius of the ICA is different between the low and up groups as follows:

set.seed(1234)
lower_ind <- which(aneurisk$variable == "radius" & aneurisk$group == "low")
upper_ind <- which(aneurisk$variable == "radius" & aneurisk$group == "up")
test_twosample(
  x = aneurisk$data[[lower_ind]], 
  y = aneurisk$data[[upper_ind]], 
  step_size = 0.01, 
  B = 100L
)
#>  - P-value resolution: 0.01
#>  - Computing approximate p-value using 100 random permutations.
#>  - P-value will not drop below 7.91072860244862e-15 in average.
#> # A tibble: 1 x 3
#>   statName  statVal pValue
#>   <chr>       <dbl>  <dbl>
#> 1 Hotelling    385.   0.96

The test tells us that, given a significance level of (5\%), there is no reason to believe that the mean radius is different between the two groups. Let us now test whether the mean radius first derivatives differ:

set.seed(1234)
lower_ind <- which(aneurisk$variable == "radius_der" & aneurisk$group == "low")
upper_ind <- which(aneurisk$variable == "radius_der" & aneurisk$group == "up")
test_twosample(
  x = aneurisk$data[[lower_ind]], 
  y = aneurisk$data[[upper_ind]], 
  step_size = 0.01, 
  B = 100L
)
#>  - P-value resolution: 0.01
#>  - Computing approximate p-value using 100 random permutations.
#>  - P-value will not drop below 7.91072860244862e-15 in average.
#> # A tibble: 1 x 3
#>   statName  statVal pValue
#>   <chr>       <dbl>  <dbl>
#> 1 Hotelling   5350.   0.02

The test tells us that, given a significance level of (5\%), there is enough statistical evidence to conclude that there is a statistically relevant difference between the mean radius first derivatives of the two groups. Note that in this example we used only 100 permutations for reducing the computation time. This is however a rather low number of permutations that limits the p-value resolution and the power of the test. We advice using higher values (at least 1000) in practice.

Example 2: Using data stored in an fd object

We can first take the two datasets extracted above about the radius curvature and transform them into fd objects as follows:

# An fd object is created from the transposed matrix 
# in the sense that it expects observations to be stored
# in columns and grid points in rows.
arc_length_lower <- aneurisk$abscissa[[lower_ind]]
arc_length_upper <- aneurisk$abscissa[[upper_ind]]
# Both datasets must be defined on the same grid
all(arc_length_upper == arc_length_lower)
#> [1] TRUE
arc_length <- arc_length_upper
# Effective transformation to fd objects
data_lower <- aneurisk$data[[lower_ind]]
fd_lower <- fda::Data2fd(arc_length, t(data_lower))
data_upper <- aneurisk$data[[upper_ind]]
fd_upper <- fda::Data2fd(arc_length, t(data_upper))
# Requirement: the differences between the mean assumed 
# in the null hypothesis must be converted in fd format 
# as well.
fd_delta <- fda::Data2fd(arc_length, matrix(0, length(arc_length), dim(data_upper)[1]))

Now we can perform the test as follows:

set.seed(1234)
test_twosample(
  x = fd_lower, 
  y = fd_upper, 
  mu = fd_delta,
  step_size = 0.01, 
  B = 100L
)
#>  - P-value resolution: 0.01
#>  - Computing approximate p-value using 100 random permutations.
#>  - P-value will not drop below 7.91072860244862e-15 in average.
#> # A tibble: 1 x 3
#>   statName  statVal pValue
#>   <chr>       <dbl>  <dbl>
#> 1 Hotelling   7097.   0.02


astamm/fdahotelling documentation built on May 10, 2019, 2:05 p.m.