smooth.basisPar: Smooth Data Using a Directly Specified Roughness Penalty

Description Usage Arguments Details Value References See Also Examples

View source: R/smooth.basisPar.R

Description

Smooth (argvals, y) data with roughness penalty defined by the remaining arguments. This function acts as a wrapper for those who want to bypass the step of setting up a functional parameter object before invoking function smooth.basis. This function simply does this setup for the user. See the help file for functions smooth.basis and fdPar for further details, and more complete descriptions of the arguments.

Usage

1
2
3
4
smooth.basisPar(argvals, y, fdobj=NULL, Lfdobj=NULL,
      lambda=0, estimate=TRUE, penmat=NULL,
      wtvec=NULL, fdnames=NULL, covariates=NULL, 
                         method="chol", dfscale=1)

Arguments

argvals

a set of argument values corresponding to the observations in array y. In most applications these values will be common to all curves and all variables, and therefore be defined as a vector or as a matrix with a single column. But it is possible that these argument values will vary from one curve to another, and in this case argvals will be input as a matrix with rows corresponding to observation points and columns corresponding to curves. Argument values can even vary from one variable to another, in which case they are input as an array with dimensions corresponding to observation points, curves and variables, respectively. Note, however, that the number of observation points per curve and per variable may NOT vary. If it does, then curves and variables must be smoothed individually rather than by a single call to this function. The default value for argvals are the integers 1 to n, where n is the size of the first dimension in argument y.

y

an set of values of curves at discrete sampling points or argument values. If the set is supplied as a matrix object, the rows must correspond to argument values and columns to replications, and it will be assumed that there is only one variable per observation. If y is a three-dimensional array, the first dimension corresponds to argument values, the second to replications, and the third to variables within replications. If y is a vector, only one replicate and variable are assumed. If the data come from a single replication but multiple vectors, such as data on coordinates for a single space curve, then be sure to coerce the data into an array object by using the as.array function with one as the central dimension length.

fdobj

One of the following:

  • fd a functional data object (class fd)

  • basisfd a functional basis object (class basisfd), which is converted to a functional data object with the identity matrix as the coefficient matrix.

  • fdPar a functional parameter object (class fdPar)

  • integer a positive integer giving the order of a B-spline basis, which is further converted to a functional data object with the identity matrix as the coefficient matrix.

  • matrix or arrayreplaced by fd(fdobj)

  • NULL Defaults to fdobj = create.bspline.basis(argvals).

Lfdobj

either a nonnegative integer or a linear differential operator object.

If NULL, Lfdobj depends on fdobj[['basis']][['type']]:

  • bspline Lfdobj <- int2Lfd(max(0, norder-2)), where norder = norder(fdobj).

  • fourier Lfdobj = a harmonic acceleration operator:

    Lfdobj <- vec2Lfd(c(0,(2*pi/diff(rng))^2,0), rng)

    where rng = fdobj[['basis']][['rangeval']].

  • anything elseLfdobj <- int2Lfd(0)

lambda

a nonnegative real number specifying the amount of smoothing to be applied to the estimated functional parameter.

estimate

a logical value: if TRUE, the functional parameter is estimated, otherwise, it is held fixed.

penmat

a roughness penalty matrix. Including this can eliminate the need to compute this matrix over and over again in some types of calculations.

wtvec

typically a vector of length n that is the length of argvals containing weights for the values to be smoothed, However, it may also be a symmetric matrix of order n. If wtvec is a vector, all values must be positive, and if it is a symmetric matrix, this must be positive definite. Defaults to all weights equal to 1.

fdnames

a list of length 3 containing character vectors of names for the following:

  • args name for each observation or point in time at which data are collected for each 'rep', unit or subject.

  • reps name for each 'rep', unit or subject.

  • fun name for each 'fun' or (response) variable measured repeatedly (per 'args') for each 'rep'.

covariates

the observed values in y are assumed to be primarily determined the the height of the curve being estimates, but from time to time certain values can also be influenced by other known variables. For example, multi-year sets of climate variables may be also determined by the presence of absence of an El Nino event, or a volcanic eruption. One or more of these covariates can be supplied as an n by p matrix, where p is the number of such covariates. When such covariates are available, the smoothing is called "semi-parametric." Matrices or arrays of regression coefficients are then estimated that define the impacts of each of these covariates for each cueve and each variable.

method

by default the function uses the usual textbook equations for computing the coefficients of the basis function expansions. But, as in regression analysis, a price is paid in terms of rounding error for such computations since they involved cross-products of basis function values. Optionally, if method is set equal to the string "qr", the computation uses an algorithm based on the qr-decomposition which is more accurate, but will require substantially more computing time when n is large, meaning more than 500 or so. The default is "chol", referring the Choleski decomposition of a symmetric positive definite matrix.

dfscale

the generalized cross-validation or "gcv" criterion that is often used to determine the size of the smoothing parameter involves the subtraction of an measue of degrees of freedom from n. Chong Gu has argued that multiplying this degrees of freedom measure by a constant slightly greater than 1, such as 1.2, can produce better decisions about the level of smoothing to be used. The default value is, however, 1.0.

Details

1. if(is.null(fdobj))fdobj <- create.bspline.basis(argvals). Else if(is.integer(fdobj)) fdobj <- create.bspline.basis(argvals, norder = fdobj)

2. fdPar

3. smooth.basis

Value

The output of a call to smooth.basis, which is an object of class fdSmooth, being a list of length 8 with the following components:

fd

a functional data object that smooths the data.

df

a degrees of freedom measure of the smooth

gcv

the value of the generalized cross-validation or GCV criterion. If there are multiple curves, this is a vector of values, one per curve. If the smooth is multivariate, the result is a matrix of gcv values, with columns corresponding to variables.

SSE

the error sums of squares. SSE is a vector or a matrix of the same size as 'gcv'.

penmat

the penalty matrix.

y2cMap

the matrix mapping the data to the coefficients.

argvals, y

input arguments

References

Ramsay, James O., and Silverman, Bernard W. (2006), Functional Data Analysis, 2nd ed., Springer, New York.

Ramsay, James O., and Silverman, Bernard W. (2002), Applied Functional Data Analysis, Springer, New York.

See Also

Data2fd, df2lambda, fdPar, lambda2df, lambda2gcv, plot.fd, project.basis, smooth.basis, smooth.fd, smooth.monotone, smooth.pos

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
##
## simplest call
##
girlGrowthSm <- with(growth, smooth.basisPar(argvals=age, y=hgtf, lambda=0.1))
plot(girlGrowthSm$fd, xlab="age", ylab="height (cm)",
         main="Girls in Berkeley Growth Study" )
plot(deriv(girlGrowthSm$fd), xlab="age", ylab="growth rate (cm / year)",
         main="Girls in Berkeley Growth Study" )
plot(deriv(girlGrowthSm$fd, 2), xlab="age",
        ylab="growth acceleration (cm / year^2)",
        main="Girls in Berkeley Growth Study" )
#  Undersmoothed with lambda = 0

##
## Another simple call
##
lipSm <- smooth.basisPar(liptime, lip, lambda=1e-9)$fd
plot(lipSm)

##
## A third example
##

x <- seq(-1,1,0.02)
y <- x + 3*exp(-6*x^2) + sin(1:101)/2
# sin not rnorm to make it easier to compare
# results across platforms

#  set up a saturated B-spline basis
basisobj101 <- create.bspline.basis(x)
fdParobj101 <- fdPar(basisobj101, 2, lambda=1)
result101   <- smooth.basis(x, y, fdParobj101)

resultP <- smooth.basisPar(argvals=x, y=y, fdobj=basisobj101, lambda=1)

all.equal(result101, resultP)

# TRUE

result4 <- smooth.basisPar(argvals=x, y=y, fdobj=4, lambda=1)

all.equal(resultP, result4)

# TRUE

result4. <- smooth.basisPar(argvals=x, y=y, lambda=1)

all.equal(resultP, result4.)

# TRUE

with(result4, c(df, gcv)) #  display df and gcv measures

result4.4 <- smooth.basisPar(argvals=x, y=y, lambda=1e-4)
with(result4.4, c(df, gcv)) #  display df and gcv measures
# less smoothing, more degrees of freedom, better fit

plot(result4.4)
lines(result4, col='green')
lines(result4$fd, col='green') # same as lines(result4, ...)

##
## fdnames?
##
girlGrow12 <- with(growth, smooth.basisPar(argvals=age, y=hgtf[, 1:2], 
              fdnames=c('age', 'girl', 'height'), lambda=0.1) )
girlGrow12. <- with(growth, smooth.basisPar(argvals=age, y=hgtf[, 1:2],
    fdnames=list(age=age, girl=c('Carol', 'Sally'), value='height'),
    lambda = 0.1) )

##
## Fourier basis with harmonic acceleration operator
##
daybasis65 <- create.fourier.basis(rangeval=c(0, 365), nbasis=65)
daytemp.fdSmooth <- with(CanadianWeather, smooth.basisPar(day.5,
       dailyAv[,,"Temperature.C"],
       daybasis65, fdnames=list("Day", "Station", "Deg C")) )

fda documentation built on May 2, 2019, 5:12 p.m.