conformal.fun.split: Functional Split conformal prediction intervals.

View source: R/split.R

conformal.fun.splitR Documentation

Functional Split conformal prediction intervals.

Description

Compute prediction intervals using split conformal inference.

Usage

conformal.fun.split(
  x,
  t,
  y,
  x0,
  train.fun,
  predict.fun,
  alpha = 0.1,
  split = NULL,
  seed = FALSE,
  randomized = FALSE,
  seed_tau = FALSE,
  verbose = FALSE,
  training_size = 0.5,
  s_type = "st-dev"
)

Arguments

x

The input variable, a list of n elements. Each element is composed by a list of p vectors(with variable length, since the evaluation grid may change). If x is NULL, the function will sample it from a gaussian.

t

The grid points for the evaluation of function y_val. It is a list of vectors. If the y_val data type is "fData" or "mfData" is must be NULL.

y

The response variable. It is either, as with x and t, a list of list of vectors or an fda object (of type fd, fData, mfData).

x0

The new points to evaluate, a list of n0 elements. Each element is composed by a list of p vectors(with variable length).

train.fun

A function to perform model training, i.e., to produce an estimator of E(Y|X), the conditional expectation of the response variable Y given features X. Its input arguments should be x: list of features, and y: list of responses.

predict.fun

A function to perform prediction for the (mean of the) responses at new feature values. Its input arguments should be out: output produced by train.fun, and newx: feature values at which we want to make predictions.

alpha

Miscoverage level for the prediction intervals, i.e., intervals with coverage 1-alpha are formed. Default for alpha is 0.1.

split

Indices that define the data-split to be used (i.e., the indices define the first half of the data-split, on which the model is trained). Default is NULL, in which case the split is chosen randomly.

seed

Integer to be passed to set.seed before defining the random data-split to be used. Default is FALSE, which effectively sets no seed. If both split and seed are passed, the former takes priority and the latter is ignored.

randomized

Should the randomized approach be used? Default is FALSE.

seed_tau

The seed for the randomized version.Default is FALSE.

verbose

Should intermediate progress be printed out? Default is FALSE.

training_size

Split proportion between training and calibration set. Default is 0.5.

s_type

The type of modulation function. Currently we have 3 options: "identity","st-dev","alpha-max". Default is "std-dev".

Value

A list with the following components: t,pred,k_s,s_type,s,alpha,randomized,tau, extremes_are_included,average_width,product_integral. t and s are lists of vectors, pred has the same interval structure of y_val, but the outside list is of length n0, k_s, average_width and product_integral are all positive floats, alpha and tau are positive floats less than 1, randomized and extremes_are_included are logical values, while s_type is a string.

References

The function structure is taken from "Conformal Prediction Bands for Multivariate Functional Data" by Diquigiovanni, Fontana, Vantini (2021) and, also, from "The Importance of Being a Band: Finite-Sample Exact Distribution-Free Prediction Sets for Functional Data" by Diquigiovanni, Fontana, Vantini (2021).

Examples

## fData #############################?

N = 20
P = 1e2
grid = seq( 0, 1, length.out = P )
C = roahd::exp_cov_function( grid, alpha = 0.3, beta = 0.4 )
values = roahd::generate_gauss_fdata( N,
                               centerline = sin( 2 * pi * grid ),
                               Cov = C )
fD = roahd::fData( grid, values )
x0=list(as.list(grid))
fun=mean_lists()
final.fData = conformal.fun.split(NULL,NULL, fD, x0, fun$train.fun, fun$predict.fun,
                             alpha=0.1,
                             split=NULL, seed=FALSE, randomized=FALSE,seed_tau=FALSE,
                             verbose=TRUE, training_size=0.5,s_type="alpha-max")
plot_fun(final.fData)

###  mfData ###################################

N = 1e2
P = 1e3
t0 = 0
t1 = 1
grid = seq( t0, t1, length.out = P )
C = roahd::exp_cov_function( grid, alpha = 0.3, beta = 0.4 )
Data_1 = roahd::generate_gauss_fdata( N, centerline = sin( 2 * pi * grid ), Cov = C )
Data_2 = roahd::generate_gauss_fdata( N, centerline = log(1+ 2 * pi * grid ), Cov = C )
mfD=roahd::mfData( grid, list( Data_1, Data_2 ) )
x0=list(as.list(grid))
fun=mean_lists()
final.mfData = conformal.fun.split(NULL,NULL, mfD, x0, fun$train.fun, fun$predict.fun,
                             alpha=0.1,
                             split=NULL, seed=FALSE, randomized=FALSE,seed_tau=FALSE,
                             verbose=TRUE, training_size=0.5,s_type="alpha-max")
h=plot_fun(final.mfData)

### fd ###########################################

daybasis <- fda::create.fourier.basis(c(0, 365), nbasis=65)
tempfd <- fda::smooth.basis(fda::day.5, fda::CanadianWeather$dailyAv[,,"Temperature.C"],daybasis)$fd
Lbasis <- fda::create.constant.basis(c(0, 365))
Lcoef <- matrix(c(0,(2*pi/365)^2,0),1,3)
bfdobj <- fda::fd(Lcoef,Lbasis)
bwtlist <- fda::fd2list(bfdobj)
harmaccelLfd <- fda::Lfd(3, bwtlist)
Ltempmat <- fda::eval.fd(fda::day.5, tempfd, harmaccelLfd)
t=1:365
x0=list(as.list(grid))
fun=mean_lists()
final.fd = conformal.fun.split(NULL,fda::day.5, tempfd, x0, fun$train.fun, fun$predict.fun,
                             alpha=0.1,
                             split=NULL, seed=FALSE, randomized=FALSE,seed_tau=FALSE,
                             verbose=TRUE, training_size=0.5,s_type="alpha-max")
plot_fun(final.fd)


paolo-vergo/conformalInference.fd documentation built on Oct. 14, 2023, 12:47 a.m.