fdaLm: Linear mixed-effects model for functional data
In fdaMixed: Functional Data Analysis in a Mixed Model Framework

View source: R/fdaLm.R

fdaLm

R Documentation

Linear mixed-effects model for functional data

Description

Fits variance and smoothing parameters, and possibly also Box-Cox transformation, by maximum restricted likelihood. Estimate fixed parameters, predict random effects, and predict serial correlated effect at point of maximum restricted likelihood. Linear models for fixed and random effects may be global or marginal over sample times.

Usage

fdaLm(formula, data, design, boxcox = NULL, G = 1, lambda = 1, nlSearch
= TRUE, K.order = 1, D.order = NULL, Fleft = "tied", Fright = "tied",
left = NULL, right = NULL)

Arguments

`formula`	A multiple formula of the type `Y\|id ~ fixed\|random`. Here `Y` is the response variable, `id` is a factor separating the samples, `fixed` is a linear model for the fixed effect, and `random` is a linear model for the random effect.
`data`	An optional data frame containing the variables. See details below.
`design`	An optional data frame containing the design variables in the specification of the fixed and the random effects. See details below.
`boxcox`	The power parameter in the scale invariant Box-Cox transformation. If `NULL` (default), then no transformation is performed. If a numeric value is provided, then a scale invariant Box-Cox transformation of the response variable is performed. The numeric value is either used as it is (`nlSearch=FALSE`) or as the starting point for a non-linear optimization (`nlSearch=TRUE`.)
`G`	Variance of the random effects. Present implementation only allows for independent random effects, i.e. `G` is scalar. Used depending on `nlSearch` as described above.
`lambda`	Start value for the `lambda` parameter describing the L-operator. Presently the following forms are implemented: If `K.order` is odd, then `lambda` may have length=1 corresponding to `L=-lambda[1]D^(2K.order)`, or length=2 corresponding to `L=-lambda[1]D^(2K.order)+lambda[2]`. If `K.order` is even, then `lambda` may have length=1 corresponding to `L=-lambda[1]D^(2K.order)`, length=2 corresponding to `L=-lambda[1]D^(2K.order)+lambda[2]D^K.order`, or length=3 corresponding to `L=-lambda[1]D^(2K.order)+lambda[2]D^K.order+lambda[3]`. Used depending on `nlSearch` as described above. All coefficients must be non-negative, and the leading coefficient `lambda[1]` must be strictly positive. Coefficients equal to zero are kept fixed at zero in the non-linear optimization.
`nlSearch`	If `TRUE` (default), then a non-linear optimization of the parameters `boxcox`, `G`, `lambda` is performed (present implementation uses `nlminb`). If `FALSE`, then the initial values of the non-linear parameters are used.
`K.order`	The order of the K-operator.
`D.order`	The requested order of derivatives of the prediction of the serial correlated effect `xBLUP`. If `NULL` (default), then `D.order` is set to the maximal recommended order `K.order`.
`Fleft`	Specification of the `K.order` boundary conditions at the left limit of the sampling interval. Value `"tied"` (default) gives bridge-type conditions. Value `"open"` shifts up the bridge-type conditions one differential order, hence removing the restriction on the level (corresponding to the open end of a Brownian motion). Otherwise arbitrary linear boundary conditions may be specified as a matrix with dimension (`K.order`,2*`K.order`).
`Fright`	Similarly for the `K.order` boundary conditions at the right limit of the sampling interval.
`left`	Left limit of the sampling interval. If `NULL` (default), then `left` is set to 0.
`right`	Right limit of the sampling interval. If `NULL` (default), then `right` is set to the number of sampling points. Thus, the default values of `left` and `right` give sampling distance equal to 1.

Details

The response variable Y is taken from the data frame data (subsidiary the parent environment). If there is more than one sample, then the responses must be stacked sample-wise on top of each other. The sample identifier id is sought for in both data frames data and design (subsidiary the parent environment). The primarily function of the identifier is to decide the number of samples. But if id is present in both data frames, and if there is more that one sample, then this variable is also used to match the reponse vector to the design variables (i.e. these need not appear in the same order).

The design variables fixed and random for the fixed and the random effects are taken from the data frame design (subsidiary the parent environment), subsidiary from the data frame data (subsidiary the parent environment).

If the number of observations in the design variables equal the total number of response observations, then a global ANOVA is performed. If the number of observations in the design variables equal the number of sample points, then a marginal ANOVA is performed.

Value

A list with components

`logLik`	Minus twice the log restricted likelihood taken at the estimates.
`ANOVA`	Specifies whether fixed and random effects were estimated globally (`global`) or marginally (`marginal`).
`nlSearch`	Specifies whether non-linear optimization was performed (`TRUE` / `FALSE`).
`counts`	Number of computations of the negative log likelihood.
`boxcoxHat`	Maximum restricted likelihood estimate for the power parameter in the scale invariant Box-Cox transformation. Equal to `not done` if the Box-Cox transformation is not used.
`Ghat`	Maximum restricted likelihood estimate for the variance matrix of the random effects.
`lambdaHat`	Maximum restricted likelihood estimate for the lambda parameter describing the L-operator.
`sigma2hat`	Maximum restricted likelihood estimate for the noise variance.
`betaHat`	For global ANOVA a vector with estimate for the fixed effect. For marginal ANOVA a matrix with estimate for the fixed effects.
`uBLUP`	For global ANOVA a vector with prediction of the random effect. For marginal ANOVA a matrix with predictions of the random effects.
`xBLUP`	Array with predictions of serial correlated effects. The dimension is (sample length,sample numbers,1+`D.order`).
`condRes`	Matrix of conditional residuals. The dimension is (sample length,sample numbers).
`betaVar`	Variance matrix of fixed effect estimate.

Note

If the real value of the left most eigenvalues are non-positive, and if the real value of the right most eigenvalues are non-negative, then the underlying algorithm is numerical stable. This will always be the situation for the present restriction of the L-operator.

If lambda has length=1, then it may also be interpreted as the smoothing parameter in the penalized likelihood framework.

If D.order is chosen larger than K.order, this number of derivaties are also computed during the non-linear optimization. This might slow down the computation speed a little bit.

Author(s)

Bo Markussen <bomar@math.ku.dk>

Examples

# ---------------------
# Using a fixed effect
# ---------------------
x <- seq(0,2*pi,length.out=200)
y.true <- sin(x)+x
y.obs <- y.true + rnorm(200)
est0 <- fdaLm(y.obs~0,Fright="open",right=2*pi)
est1 <- fdaLm(y.obs~0+x,Fright="open",right=2*pi)
plot(x,y.obs,main="Estimating the sum of a line and a curve")
lines(x,y.true,lty=2)
lines(x,est0$xBLUP[,1,1],col=2)
lines(x,est1$betaHat*x+est1$xBLUP[,1,1],col=3)
legend("topleft",c("True curve","Smooth","Line + smooth"),col=1:3,lty=c(2,1,1))

# --------------------------
# Including a random effect
# --------------------------
# Build data frame
test.frame <- data.frame(y=rnorm(50),sample=factor(rep(1:5,each=10)),
                         x=rep(0:9,times=5),
                         f=factor(rnorm(50) < 0,labels=c("a","b")),
                         j=factor(rnorm(50) < 0,labels=c("A","B")))
test.frame$y <- test.frame$y + 2 +
    3*(test.frame$f=="a")*test.frame$x + 5*(test.frame$f=="b")*test.frame$x +
(-10)*(test.frame$j=="A") + 10*(test.frame$j=="B")
# This is the model 'y|sample ~ f:x|j' with intercept=2, slopes (3,5),
# and random effects (-10,10)
est <- fdaLm(y|sample ~ f:x|0+j,data=test.frame)
print(est)

fdaMixed documentation built on Sept. 14, 2023, 1:09 a.m.