Functional canonical correlation analysis between a scalar variable and a list of mixed scalar and functional variables.

Share:

Description

This function carries out the canonical correlation analysis between a scalar variable and a list of mixed scalar and functional variables. There are four choices of the returned values and three representation methods of the functional variables.

Usage

1
2
fccaGen(xL,yVec,type=c('dir','cor','a','all'),method=c('basis',
      'gq','raw'),GCV=TRUE,control=list())

Arguments

xL

The mixed scalar and functional variables. If there is only one functional variable, xL can be a matrix. If there is only scalar variables, xL can be a vector or a matrix. If there are more than one functional variables, or there are mixed functional and scalar variables, xL should be a list. If xL is a list, each item of the list should correspond to one variable.

yVec

The scalar variable. It should be a matrix.

type

The choice of outcomes. See details for more information.

method

The representative methods for the functional coefficients. The method could be one of the 'basis', 'gq' and 'raw' for basis function expression, Gaussian quadrature and representative data points, respectively.

GCV

Use generalized cross validation (GCV) or not to choose the tuning parameter. Logic argument. Currently the only choice is to use GCV.

control

List of elements that controls the details of the functional coefficients. See details for more information.

Details

There are four choices of type in the function. 'dir' means that the function only returns the direction coefficients like the one in the traditional Canonical correlation analysis. 'cor' means that the function only returns the correlation coefficients. 'a' means that the function only returns the normalized direction coefficients. With this normalization, the direction coefficients are equivalent to the coefficients from a linear regression with response variable yVec and covariates xL. 'all' means that the function returns all three outcomes mentioned above.

The argument control is a list. It changes when different representative methods are used for the functional coefficients. If (type=='basis'), the list contains the following items:

  • nbasis: Number of B-spline basis functions. Default value is 18.

  • norder: Order of the basis functions. Default value is 6.

  • pen1: The candidate values of the smoothing parameter. Default values are

    10^(seq((-20),5,len=41))

  • pen2: The candidate values of the ridge tuning parameter. Default value is

    0.01

  • t: IMPORTANT! The time points correspond to the discrete data points of the functional variables. Default to be seq(0,1,len=max(sapply(xL,ncol),na.rm = T)). Do NOT change the starting and ending point of the sequence.

If (type=='gq'), the list contains the following items:

  • nP: Number of Gaussian quadrature points. Default value is 18.

  • pen1: The candidate values of the smoothing parameter. Default values are

    10^(seq((-20),5,len=21))

  • pen2: The candidate values of the ridge tuning parameter. Default value is

    0.01

  • t: IMPORTANT! The time points correspond to the discrete data points of the functional variables. Default to be seq(-1,1,len=max(sapply(xL,ncol),na.rm = T)). Do NOT change the starting and ending point of the sequence.

If (type=='raw'), the list contains the following items:

  • pen1: The candidate values of the smoothing parameter. Default values are

    10^(seq((-20),5,len=21))

  • pen2: The candidate values of the ridge tuning parameter. Default value is

    0.01

  • t: IMPORTANT! The time points correspond to the discrete data points of the functional variables. Default to be seq(0,1,len=max(sapply(xL,ncol),na.rm = T)). Do NOT change the starting and ending point of the sequence.

The function is designed to be able to handle the situation when different functional variables have different number of discrete data points and the discrete data points could be non-evenly spaced. This would require a list of t to input in the argument. However, this is not fully tested at the moment. For convenient, especially when we have a large number of functional variables, a universal setting of t is recommended.

Value

corr

Correlation coefficient. It is returned when type='corr' or type='all'

a

Normalized direction coefficients. It is returned when type='a' or type='all'

dir

Direction coefficients. It is returned when type='dir'

K

Penalized covariance matrix. It is returned when type='all'.

gq

Gaussian quadrature weights. It is returned when type='all'.

phiL

Known part of the functional coefficients. E.g, basis functions. It is returned when type='all'.

S

Hat matrix. It is returned when type='all'.

lam1

The selected smoothing parameter. It is returned when type='all'.

lam2

The selected ridge parameter. It is returned when type='all'.

GCV_mat

The GCV value. It is returned when type='all'.

TraceHat

Trace of the hat matrix. It is returned when type='all'.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(flars)
## Generate some data.
dataL=data_generation(seed = 1,uncorr = TRUE,nVar = 8,nsamples = 120,
      var_type = 'm',cor_type = 1)

## If there is only one functional variable
# out1=fccaGen(dataL$x[1], dataL$y, type='all', method='basis')

## If there are only a few scalar variables
# x=matrix(rnorm(3*length(dataL$y)),ncol=3)
# out2=fccaGen(x, dataL$y, type='all', method='basis')

## If there are mixed scalar and functional variables
# out3=fccaGen(dataL$x, dataL$y, type='all', method='basis')