fRegress: Functional Regression Analysis

View source: R/fRegress.R

fRegressR Documentation

Functional Regression Analysis

Description

This function carries out a functional regression analysis, where either the dependent variable or one or more independent variables are functional. Non-functional variables may be used on either side of the equation. In a simple problem where there is a single scalar independent covariate with values z_i, i=1,\ldots,N and a single functional covariate with values x_i(t), the two versions of the model fit by fRegress are the scalar dependent variable model

y_i = \beta_1 z_i + \int x_i(t) \beta_2(t) \, dt + e_i

and the concurrent functional dependent variable model

y_i(t) = \beta_1(t) z_i + \beta_2(t) x_i(t) + e_i(t).

In these models, the final term e_i or e_i(t) is a residual, lack of fit or error term.

In the concurrent functional linear model for a functional dependent variable, all functional variables are all evaluated at a common time or argument value t. That is, the fit is defined in terms of the behavior of all variables at a fixed time, or in terms of "now" behavior.

All regression coefficient functions \beta_j(t) are considered to be functional. In the case of a scalar dependent variable, the regression coefficient for a scalar covariate is converted to a functional variable with a constant basis. All regression coefficient functions can be forced to be smooth through the use of roughness penalties, and consequently are specified in the argument list as functional parameter objects.

Usage

fRegress(y, ...)
## S3 method for class 'fd'
fRegress(y, xfdlist, betalist, wt=NULL,
                     y2cMap=NULL, SigmaE=NULL, returnMatrix=FALSE, 
                        method=c('fRegress', 'model'), sep='.', ...)
## S3 method for class 'double'
fRegress(y, xfdlist, betalist, wt=NULL,
                     y2cMap=NULL, SigmaE=NULL, returnMatrix=FALSE, ...)
## S3 method for class 'formula'
fRegress(y, data=NULL, betalist=NULL, wt=NULL,
                 y2cMap=NULL, SigmaE=NULL,
                 method='fRegress', sep='.', ...)
## S3 method for class 'character'
fRegress(y, data=NULL, betalist=NULL, wt=NULL,
                 y2cMap=NULL, SigmaE=NULL,
                 method='fRegress', sep='.', ...)

Arguments

y

the dependent variable object. It may be an object of five possible classes or attributes:

character or formula

a formula object or a character object that can be coerced into a formula providing a symbolic description of the model to be fitted satisfying the following rules:

The left hand side, formula y, must be either a numeric vector or a univariate object of class fd.

All objects named on the right hand side must be either numeric or fd (functional data). The number of replications of fd object(s) must match each other and the number of observations of numeric objects named, as well as the number of replications of the dependent variable object. The right hand side of this formula is translated into xfdlist, then passed to another method for fitting (unless method = 'model'). Multivariate independent variables are allowed in a formula and are split into univariate independent variables in the resulting xfdlist. Similarly, categorical independent variables with k levels are translated into k-1 contrasts in xfdlist. Any smoothing information is passed to the corresponding component of betalist.

numeric

a numeric vector object or a matrix object if the dependent variable is numeric or a matrix.

fd

a functional data object or an fdPar object if the dependent variable is functional.

data

an optional list or data.frame containing names of objects identified in the formula or character y.

xfdlist

a list of length equal to the number of independent variables (including any intercept). Members of this list are the independent variables. They can be objects of either of these two classes:

scalar

a numeric vector if the independent variable is scalar.

fd

a (univariate) functional data object.

In either case, the object must have the same number of replications as the dependent variable object. That is, if it is a scalar, it must be of the same length as the dependent variable, and if it is functional, it must have the same number of replications as the dependent variable. (Only univariate independent variables are currently allowed in xfdlist.)

betalist

For the fd, fdPar, and numeric methods, betalist must be a list of length equal to length(xfdlist). Members of this list are functional parameter objects (class fdPar) defining the regression functions to be estimated. Even if a corresponding independent variable is scalar, its regression coefficient must be functional if the dependent variable is functional. (If the dependent variable is a scalar, the coefficients of scalar independent variables, including the intercept, must be constants, but the coefficients of functional independent variables must be functional.) Each of these functional parameter objects defines a single functional data object, that is, with only one replication.

For the formula and character methods, betalist can be either a list, as for the other methods, or NULL, in which case a list is created. If betalist is created, it will use the bases from the corresponding component of xfdlist if it is function or from the response variable. Smoothing information (arguments Lfdobj, lambda, estimate, and penmat of function fdPar) will come from the corresponding component of xfdlist if it is of class fdPar (or for scalar independent variables from the response variable if it is of class fdPar) or from optional ... arguments if the reference variable is not of class fdPar.

wt

weights for weighted least squares

y2cMap

the matrix mapping from the vector of observed values to the coefficients for the dependent variable. This is output by function smooth.basis. If this is supplied, confidence limits are computed, otherwise not.

SigmaE

Estimate of the covariances among the residuals. This can only be estimated after a preliminary analysis with fRegress.

method

a character string matching either fRegress for functional regression estimation or mode without running it.

sep

separator for creating names for multiple variables for fRegress.fdPar or fRegress.numeric created from single variables on the right hand side of the formula y. This happens with multidimensional fd objects as well as with categorical variables.

returnMatrix

logical: If TRUE, a two-dimensional is returned using a special class from the Matrix package.

...

optional arguments

Details

Alternative forms of functional regression can be categorized with traditional least squares using the following 2 x 2 table:

explanatory variable
response | scalar | function
| |
scalar | lm | fRegress.numeric
| |
function | fRegress.fd or | fRegress.fd or
| fRegress.fdPar | fRegress.fdPar or linmod

For fRegress.numeric, the numeric response is assumed to be the sum of integrals of xfd * beta for all functional xfd terms.

fRegress.fd or .fdPar produces a concurrent regression with each beta being also a (univariate) function.

linmod predicts a functional response from a convolution integral, estimating a bivariate regression function.

In the computation of regression function estimates in fRegress, all independent variables are treated as if they are functional. If argument xfdlist contains one or more vectors, these are converted to functional data objects having the constant basis with coefficients equal to the elements of the vector.

Needless to say, if all the variables in the model are scalar, do NOT use this function. Instead, use either lm or lsfit.

These functions provide a partial implementation of Ramsay and Silverman (2005, chapters 12-20).

Value

These functions return either a standard fRegress fit object or or a model specification:

The \code{fRegress} fit object case:

A list of class fRegress with the following components:

y:

The first argument in the call to fRegress. This argument is coerced to class fd in fda version 5.1.9. Prior versions of the package converted it to an fdPar, but the extra structures in that class were not used in any of the fRegress codes.

xfdlist:

The second argument in the call to fRegress.

betalist:

The third argument in the call to fRegress.

betaestlist:

A list of length equal to the number of independent variables and with members having the same functional parameter structure as the corresponding members of betalist. These are the estimated regression coefficient functions.

yhatfdobj:

A functional parameter object (class fdPar) if the dependent variable is functional or a vector if the dependent variable is scalar. This is the set of predicted by the functional regression model for the dependent variable.

Cmatinv:

A matrix containing the inverse of the coefficient matrix for the linear equations that define the solution to the regression problem. This matrix is required for function fRegress.stderr that estimates confidence regions for the regression coefficient function estimates.

wt:

The vector of weights input or inferred.

If class(y) is numeric, the fRegress object also includes:

df:

The equivalent degrees of freedom for the fit.

OCV

the leave-one-out cross validation score for the model.

gcv:

The generalized cross validation score.

If class(y) is fd or fdPar, the fRegress object returned also includes 5 other components:

y2cMap:

An input y2cMap.

SigmaE:

An input SigmaE.

betastderrlist:

An fd object estimating the standard errors of betaestlist.

bvar:

A covariance matrix for regression coefficient estimates.

c2bMap:

A mapping matrix that maps variation in Cmat to variation in regression coefficients.

The model specification object case:

The fRegress.formula and fRegress.character functions translate the formula into the argument list required by fRegress.fdPar or fRegress.numeric. With the default value 'fRegress' for the argument method, this list is then used to call the appropriate other fRegress function. Alternatively, to see how the formula is translated, use the alternative 'model' value for the argument method. In that case, the function returns a list with the arguments otherwise passed to these other functions plus the following additional components:

xfdlist0:

A list of the objects named on the right hand side of formula. This will differ from xfdlist for any categorical or multivariate right hand side object.

type:

the type component of any fd object on the right hand side of formula.

nbasis:

A vector containing the nbasis components of variables named in formula having such components.

xVars:

An integer vector with all the variable names on the right hand side of formula containing the corresponding number of variables in xfdlist. This can exceed 1 for any multivariate object on the right hand side of class either numeric or fd as well as any categorical variable.

Author(s)

J. O. Ramsay, Giles Hooker, and Spencer Graves

References

Ramsay, James O., Hooker, Giles, and Graves, Spencer (2009), Functional data analysis with R and Matlab, Springer, New York.

Ramsay, James O., and Silverman, Bernard W. (2005), Functional Data Analysis, 2nd ed., Springer, New York.

Ramsay, James O., and Silverman, Bernard W. (2002), Applied Functional Data Analysis, Springer, New York.

See Also

fRegress.stderr, fRegress.CV, Fperm.fd, Fstat.fd, linmod

Examples


oldpar <- par(no.readonly=TRUE)
###
###
###   vector response with functional explanatory variable  
###
###

#  data are in Canadian Weather object
#  print the names of the data
print(names(CanadianWeather))
#  set up log10 of annual precipitation for 35 weather stations
annualprec <- 
    log10(apply(CanadianWeather$dailyAv[,,"Precipitation.mm"], 2,sum))
# The simplest 'fRegress' call is singular with more bases
# than observations, so we use only 25 basis functions, for this example
smallbasis  <- create.fourier.basis(c(0, 365), 25)
# The covariate is the temperature curve for each station.
tempfd <- 
 smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"], smallbasis)$fd
##
## formula interface:  specify the model by a formula, the method
## fRegress.formula automatically sets up the regression coefficient functions,
## a constant function for the intercept, 
## and a higher dimensional function
## for the inner product with temperature
##

precip.Temp1 <- fRegress(annualprec ~ tempfd, method="fRegress")

#  the output is a list with class name fRegress, display names
names(precip.Temp1)
#[c1] "yvec"           "xfdlist"        "betalist"       "betaestlist"    "yhatfdobj"     
# [6] "Cmat"           "Dmat"           "Cmatinv"        "wt"             "df"            
#[11] "GCV"            "OCV"            "y2cMap"         "SigmaE"         "betastderrlist"
#[16] "bvar"           "c2bMap"       

#  the vector of fits to the data is object  precip.Temp1$yfdPar,
#  but since the dependent variable is a vector, so is the fit
annualprec.fit1 <- precip.Temp1$yhatfdobj
#  plot the data and the fit
plot(annualprec.fit1, annualprec, type="p", pch="o")
lines(annualprec.fit1, annualprec.fit1, lty=2)
#  print root mean squared error
RMSE <- round(sqrt(mean((annualprec-annualprec.fit1)^2)),3)
print(paste("RMSE =",RMSE))
#  plot the estimated regression function
plot(precip.Temp1$betaestlist[[2]])
#  This isn't helpful either, the coefficient function is too
#  complicated to interpret.
#  display the number of basis functions used:
print(precip.Temp1$betaestlist[[2]]$fd$basis$nbasis)
#  25 basis functions to fit 35 values, no wonder we over-fit the data

##
## Get the default setup and modify it
## the "model" value of the method argument causes the analysis
## to produce a list vector of arguments for calling the
## fRegress function
##

precip.Temp.mdl1 <- fRegress(annualprec ~ tempfd, method="model")
# First confirm we get the same answer as above by calling
# function fRegress() with these arguments:
precip.Temp.m <- do.call('fRegress', precip.Temp.mdl1)

all.equal(precip.Temp.m, precip.Temp1)


#  set up a smaller basis for beta2 than for temperature so that we
#  get a more parsimonious fit to the data

nbetabasis2 <- 21  #  not much less, but we add some roughness penalization
betabasis2  <- create.fourier.basis(c(0, 365), nbetabasis2)
betafd2     <- fd(rep(0, nbetabasis2), betabasis2)
# add smoothing
betafdPar2  <- fdPar(betafd2, lambda=10)

# replace the regress coefficient function with this fdPar object

precip.Temp.mdl2 <- precip.Temp.mdl1
precip.Temp.mdl2[['betalist']][['tempfd']] <- betafdPar2

# Now do re-fit the data

precip.Temp2 <- do.call('fRegress', precip.Temp.mdl2)

# Compare the two fits:
#  degrees of freedom
precip.Temp1[['df']] # 26
precip.Temp2[['df']] # 22
#  root-mean-squared errors:
RMSE1 <- round(sqrt(mean(with(precip.Temp1, (yhatfdobj-yvec)^2))),3)
RMSE2 <- round(sqrt(mean(with(precip.Temp2, (yhatfdobj-yvec)^2))),3)
print(c(RMSE1, RMSE2))
#  display further results for the more parsimonious model
annualprec.fit2 <- precip.Temp2$yhatfdobj
plot(annualprec.fit2, annualprec, type="p", pch="o")
lines(annualprec.fit2, annualprec.fit2, lty=2)
#  plot the estimated regression function
plot(precip.Temp2$betaestlist[[2]])
#  now we see that it is primarily the temperatures in the
#  early winter that provide the fit to log precipitation by temperature

##
## Manual construction of xfdlist and betalist
##

xfdlist <- list(const=rep(1, 35), tempfd=tempfd)

# The intercept must be constant for a scalar response
betabasis1 <- create.constant.basis(c(0, 365))
betafd1    <- fd(0, betabasis1)
betafdPar1 <- fdPar(betafd1)

betafd2     <- fd(matrix(0,7,1), create.bspline.basis(c(0, 365),7))
# convert to an fdPar object
betafdPar2  <- fdPar(betafd2)

betalist <- list(const=betafdPar1, tempfd=betafdPar2)

precip.Temp3   <- fRegress(annualprec, xfdlist, betalist)
annualprec.fit3 <- precip.Temp3$yhatfdobj
#  plot the data and the fit
plot(annualprec.fit3, annualprec, type="p", pch="o")
lines(annualprec.fit3, annualprec.fit3)
plot(precip.Temp3$betaestlist[[2]])

###
###
###  functional response with vector explanatory variables  
###
###

##
## simplest:  formula interface
##

daybasis65 <- create.fourier.basis(rangeval=c(0, 365), nbasis=65,
                  axes=list('axesIntervals'))
Temp.fd <- with(CanadianWeather, smooth.basisPar(day.5,
                dailyAv[,,'Temperature.C'], daybasis65)$fd)
TempRgn.f <- fRegress(Temp.fd ~ region, CanadianWeather)

##
## Get the default setup and possibly modify it
##

TempRgn.mdl <- fRegress(Temp.fd ~ region, CanadianWeather, method='model')

# make desired modifications here
# then run

TempRgn.m <- do.call('fRegress', TempRgn.mdl)

# no change, so match the first run

all.equal(TempRgn.m, TempRgn.f)


##
## More detailed set up
##

region.contrasts <- model.matrix(~factor(CanadianWeather$region))
rgnContr3 <- region.contrasts
dim(rgnContr3) <- c(1, 35, 4)
dimnames(rgnContr3) <- list('', CanadianWeather$place, c('const',
   paste('region', c('Atlantic', 'Continental', 'Pacific'), sep='.')) )

const365 <- create.constant.basis(c(0, 365))
region.fd.Atlantic <- fd(matrix(rgnContr3[,,2], 1), const365)
# str(region.fd.Atlantic)
region.fd.Continental <- fd(matrix(rgnContr3[,,3], 1), const365)
region.fd.Pacific <- fd(matrix(rgnContr3[,,4], 1), const365)
region.fdlist <- list(const=rep(1, 35),
     region.Atlantic=region.fd.Atlantic,
     region.Continental=region.fd.Continental,
     region.Pacific=region.fd.Pacific)
# str(TempRgn.mdl$betalist)

###
###
###  functional response with functional explanatory variable  
###
###

##
##  predict knee angle from hip angle;  
##     from demo('gait', package='fda')

##
## formula interface
##
gaittime   <- as.matrix((1:20)/21)
gaitrange  <- c(0,20)
gaitbasis  <- create.fourier.basis(gaitrange, nbasis=21)
gaitnbasis <- gaitbasis$nbasis
gaitcoef   <- matrix(0,gaitnbasis,dim(gait)[2])
harmaccelLfd <- vec2Lfd(c(0, (2*pi/20)^2, 0), rangeval=gaitrange)
gaitfd     <- smooth.basisPar(gaittime, gait, gaitbasis, 
                          Lfdobj=harmaccelLfd, lambda=1e-2)$fd
hipfd  <- gaitfd[,1]
kneefd <- gaitfd[,2]

knee.hip.f <- fRegress(kneefd ~ hipfd)

##
## manual set-up
##

#  set up the list of covariate objects
const  <- rep(1, dim(kneefd$coef)[2])
xfdlist  <- list(const=const, hipfd=hipfd)

beta0 <- with(kneefd, fd(gaitcoef, gaitbasis, fdnames))
beta1 <- with(hipfd,  fd(gaitcoef, gaitbasis, fdnames))

betalist  <- list(const=fdPar(beta0), hipfd=fdPar(beta1))

fRegressout <- fRegress(kneefd, xfdlist, betalist)
par(oldpar)

fda documentation built on Sept. 30, 2024, 9:19 a.m.