smooth.construct.pco.smooth.spec | R Documentation |
Smooth constructor function for principal coordinate ridge regression fitted
by [mgcv]{gam}
. When the principal coordinates are defined by a
relevant distance among functional predictors, this is a form of nonparametric
scalar-on-function regression. Reiss et al. (2016) describe the approach and
apply it to dynamic time warping distances among functional predictors.
## S3 method for class 'pco.smooth.spec'
smooth.construct(object, data, knots)
object |
a smooth specification object, usually generated by a term of
the form |
data |
a list containing just the data. |
knots |
IGNORED! |
An object of class pco.smooth
. The resulting object has an
xt
element which contains details of the multidimensional scaling,
which may be interesting.
The constructor is not normally called directly, but is
rather used internally by {gam}
.
In a [mgcv]{gam}
term of the above form s(dummy, bs="pco",
k, xt)
,
dummy
is an arbitrary vector (or name of a
column in data
) whose length is the number of observations. This is
not actually used, but is required as part of the input to
[mgcv]{s}
. Note that if multiple pco
terms are used in
the model, there must be multiple unique term names (e.g., "dummy1
",
"dummy2
", etc).
k
is the number of principal coordinates
(e.g., k=9
will give a 9-dimensional projection of the data).
xt
is a list supplying the distance information, in one of two ways.
(i) A matrix Dmat
of distances can be supplied directly via
xt=list(D=Dmat,...)
. (ii) Alternatively, one can use
xt=list(realdata=..., dist_fn=..., ...)
to specify a data
matrix realdata
and distance function dist_fn
, whereupon a
distance matrix dist_fn(realdata)
is created.
The list xt
also has the following optional elements:
add
: Passed
to {cmdscale}
when performing multidimensional scaling; for
details, see the help for that function. (Default FALSE
.)
fastcmd
: if TRUE
, multidimensional scaling is performed by
{cmdscale_lanczos}
, which uses Lanczos iteration to
eigendecompose the distance matrix; if FALSE
, MDS is carried out by
{cmdscale}
. Default is FALSE
, to use cmdscale
.
David L Miller, based on code from Lan Huo and Phil Reiss
Reiss, P. T., Miller, D. L., Wu, P.-S., and Wen-Yu Hua, W.-Y. Penalized nonparametric scalar-on-function regression via principal coordinates.
## Not run:
# a simulated example
library(refund)
library(mgcv)
require(dtw)
## First generate the data
Xnl <- matrix(0, 30, 101)
set.seed(813)
tt <- sort(sample(1:90, 30))
for(i in 1:30){
Xnl[i, tt[i]:(tt[i]+4)] <- -1
Xnl[i, (tt[i]+5):(tt[i]+9)] <- 1
}
X.toy <- Xnl + matrix(rnorm(30*101, ,0.05), 30)
y.toy <- tt + rnorm(30, 0.05)
y.rainbow <- rainbow(30, end=0.9)[(y.toy-min(y.toy))/
diff(range(y.toy))*29+1]
## Display the toy data
par(mfrow=c(2, 2))
matplot((0:100)/100, t(Xnl[c(4, 25), ]), type="l", xlab="t", ylab="",
ylim=range(X.toy), main="Noiseless functions")
matplot((0:100)/100, t(X.toy[c(4, 25), ]), type="l", xlab="t", ylab="",
ylim=range(X.toy), main="Observed functions")
matplot((0:100)/100, t(X.toy), type="l", lty=1, col=y.rainbow, xlab="t",
ylab="", main="Rainbow plot")
## Obtain DTW distances
D.dtw <- dist(X.toy, method="dtw", window.type="sakoechiba", window.size=5)
## Compare PC vs. PCo ridge regression
# matrix to store results
GCVmat <- matrix(NA, 15, 2)
# dummy response variable
dummy <- rep(1,30)
# loop over possible projection dimensions
for (k. in 1:15){
# fit PC (m1) and PCo (m2) ridge regression
m1 <- gam(y.toy ~ s(dummy, bs="pco", k=k.,
xt=list(realdata=X.toy, dist_fn=dist)), method="REML")
m2 <- gam(y.toy ~ s(dummy, bs="pco", k=k., xt=list(D=D.dtw)), method="REML")
# calculate and store GCV scores
GCVmat[k., ] <- length(y.toy) * c(sum(m1$residuals^2)/m1$df.residual^2,
sum(m2$residuals^2)/m2$df.residual^2)
}
## plot the GCV scores per dimension for each model
matplot(GCVmat, lty=1:2, col=1, pch=16:17, type="o", ylab="GCV",
xlab="Number of principal components / coordinates",
main="GCV score")
legend("right", c("PC ridge regression", "DTW-based PCoRR"), lty=1:2, pch=16:17)
## example of making a prediction
# fit a model to the toy data
m <- gam(y.toy ~ s(dummy, bs="pco", k=2, xt=list(D=D.dtw)), method="REML")
# first build the distance matrix
# in this case we just subsample the original matrix
# see ?pco_predict_preprocess for more information on formatting this data
dist_list <- list(dummy = as.matrix(D.dtw)[, c(1:5,10:15)])
# preprocess the prediction data
pred_data <- pco_predict_preprocess(m, newdata=NULL, dist_list)
# make the prediction
p <- predict(m, pred_data)
# check that these are the same as the corresponding fitted values
print(cbind(fitted(m)[ c(1:5,10:15)],p))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.