parafac2: Parallel Factor Analysis-2
In multiway: Component Models for Multi-Way Data

parafac2

R Documentation

Parallel Factor Analysis-2

Description

Fits Richard A. Harshman's Parallel Factors-2 (Parafac2) model to 3-way or 4-way ragged data arrays. Parameters are estimated via alternating least squares with optional constraints.

Usage

parafac2(X, nfac, nstart = 10, const = NULL, control = NULL,
         Gfixed = NULL, Bfixed = NULL, Cfixed = NULL, Dfixed = NULL,
         Gstart = NULL, Bstart = NULL, Cstart = NULL, Dstart = NULL,
         Gstruc = NULL, Bstruc = NULL, Cstruc = NULL, Dstruc = NULL,
         Gmodes = NULL, Bmodes = NULL, Cmodes = NULL, Dmodes = NULL,
         maxit = 500, ctol = 1e-4, parallel = FALSE, cl = NULL,
         output = c("best", "all"), verbose = TRUE, backfit = FALSE)

Arguments

`X`	For 3-way Parafac2: list of length `K` where `k`-th element is `I[k]`-by-`J` matrix or three-way data array with `dim=c(I,J,K)`. For 4-way Parafac2: list of length `L` where `l`-th element is `I[l]`-by-`J`-by-`K` array or four-way data array with `dim=c(I,J,K,L)`. Missing data are allowed (see Note).
`nfac`	Number of factors.
`nstart`	Number of random starts.
`const`	Character vector of length 3 or 4 giving the constraints for each mode (defaults to unconstrained). See `const` for the 24 available options.
`control`	List of parameters controlling options for smoothness constraints. This is passed to `const.control`, which describes the available options.
`Gfixed`	Used to fit model with fixed Phi matrix: `crossprod(Gfixed) = Phi`.
`Bfixed`	Used to fit model with fixed Mode B weights.
`Cfixed`	Used to fit model with fixed Mode C weights.
`Dfixed`	Used to fit model with fixed Mode D weights.
`Gstart`	Starting Mode A crossproduct matrix: `crossprod(Gstart) = Phi`. Default uses random weights.
`Bstart`	Starting Mode B weights. Default uses random weights.
`Cstart`	Starting Mode C weights. Default uses random weights.
`Dstart`	Starting Mode D weights. Default uses random weights.
`Gstruc`	Structure constraints for Mode A crossproduct matrix: `crossprod(Gstruc) = Phistruc`. See Note.
`Bstruc`	Structure constraints for Mode B weights. See Note.
`Cstruc`	Structure constraints for Mode C weights. See Note.
`Dstruc`	Structure constraints for Mode D weights. See Note.
`Gmodes`	Mode ranges for Mode A weights (for unimodality constraints). Ignored.
`Bmodes`	Mode ranges for Mode B weights (for unimodality constraints). See Note.
`Cmodes`	Mode ranges for Mode C weights (for unimodality constraints). See Note.
`Dmodes`	Mode ranges for Mode D weights (for unimodality constraints). See Note.
`maxit`	Maximum number of iterations.
`ctol`	Convergence tolerance (R^2 change).
`parallel`	Logical indicating if `parLapply` should be used. See Examples.
`cl`	Cluster created by `makeCluster`. Only used when `parallel=TRUE`.
`output`	Output the best solution (default) or output all `nstart` solutions.
`verbose`	If `TRUE`, fitting progress is printed via `txtProgressBar`. Ignored if `parallel=TRUE`.
`backfit`	Should backfitting algorithm be used for `cmls`?

Details

Given a list of matrices X[[k]] = matrix(xk,I[k],J) for k = seq(1,K), the 3-way Parafac2 model (with Mode A nested in Mode C) can be written as

X[[k]] = tcrossprod(A[[k]] %*% diag(C[k,]), B) + E[[k]]

subject to crossprod(A[[k]]) = Phi

where A[[k]] = matrix(ak,I[k],R) are the Mode A (first mode) weights for the k-th level of Mode C (third mode), Phi is the common crossproduct matrix shared by all K levels of Mode C, B = matrix(b,J,R) are the Mode B (second mode) weights, C = matrix(c,K,R) are the Mode C (third mode) weights, and E[[k]] = matrix(ek,I[k],J) is the residual matrix corresponding to k-th level of Mode C.

Given a list of arrays X[[l]] = array(xl, dim = c(I[l],J,K)) for l = seq(1,L), the 4-way Parafac2 model (with Mode A nested in Mode D) can be written as

X[[l]][,,k] = tcrossprod(A[[l]] %*% diag(D[l,]*C[k,]), B) + E[[l]][,,k]

subject to crossprod(A[[l]]) = Phi

where A[[l]] = matrix(al,I[l],R) are the Mode A (first mode) weights for the l-th level of Mode D (fourth mode), Phi is the common crossproduct matrix shared by all L levels of Mode D, D = matrix(d,L,R) are the Mode D (fourth mode) weights, and E[[l]] = array(el, dim = c(I[l],J,K)) is the residual array corresponding to l-th level of Mode D.

Weight matrices are estimated using an alternating least squares algorithm with optional constraints.

Value

If output = "best", returns an object of class "parafac2" with the following elements:

`A`	List of Mode A weight matrices.
`B`	Mode B weight matrix.
`C`	Mode C weight matrix.
`D`	Mode D weight matrix.
`Phi`	Mode A crossproduct matrix.
`SSE`	Sum of Squared Errors.
`Rsq`	R-squared value.
`GCV`	Generalized Cross-Validation.
`edf`	Effective degrees of freedom.
`iter`	Number of iterations.
`cflag`	Convergence flag. See Note.
`const`	See argument `const`.
`control`	See argument `control`.
`fixed`	Logical vector indicating whether 'fixed' weights were used for each mode.
`struc`	Logical vector indicating whether 'struc' constraints were used for each mode.

Otherwise returns a list of length nstart where each element is an object of class "parafac2".

Warnings

The algorithm can perform poorly if the number of factors nfac is set too large.

Note

Missing data should be specified as NA values in the input X. The missing data are randomly initialized and then iteratively imputed as a part of the algorithm.

Structure constraints should be specified with a matrix of logicals (TRUE/FALSE), such that FALSE elements indicate a weight should be constrained to be zero. Default uses unstructured weights, i.e., a matrix of all TRUE values.

When using unimodal constraints, the *modes inputs can be used to specify the mode search range for each factor. These inputs should be matrices with dimension c(2,nfac) where the first row gives the minimum mode value and the second row gives the maximum mode value (with respect to the indicies of the corresponding weight matrix).

Output cflag gives convergence information: cflag = 0 if algorithm converged normally, cflag = 1 if maximum iteration limit was reached before convergence, and cflag = 2 if algorithm terminated abnormally due to a problem with the constraints.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Harshman, R. A. (1972). PARAFAC2: Mathematical and technical notes. UCLA Working Papers in Phonetics, 22, 30-44.

Helwig, N. E. (2013). The special sign indeterminacy of the direct-fitting Parafac2 model: Some implications, cautions, and recommendations, for Simultaneous Component Analysis. Psychometrika, 78, 725-739. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/S11336-013-9331-7")}

Helwig, N. E. (2017). Estimating latent trends in multivariate longitudinal data via Parafac2 with functional and structural constraints. Biometrical Journal, 59(4), 783-803. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/bimj.201600045")}

Kiers, H. A. L., ten Berge, J. M. F., & Bro, R. (1999). PARAFAC2-part I: A direct-fitting algorithm for the PARAFAC2 model. Journal of Chemometrics, 13, 275-294. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1002/(SICI)1099-128X(199905/08)13:3/4<275::AID-CEM543>3.0.CO;2-B")}

Examples

##########   3-way example   ##########

# create random data list with Parafac2 structure
set.seed(3)
mydim <- c(NA, 10, 20)
nf <- 2
nk <- rep(c(50, 100, 200), length.out = mydim[3])
Gmat <- matrix(rnorm(nf^2), nrow = nf, ncol = nf)
Bmat <- matrix(runif(mydim[2]*nf), nrow = mydim[2], ncol = nf)
Cmat <- matrix(runif(mydim[3]*nf), nrow = mydim[3], ncol = nf)
Xmat <- Emat <- Amat <- vector("list", mydim[3])
for(k in 1:mydim[3]){
  Amat[[k]] <- matrix(rnorm(nk[k]*nf), nrow = nk[k], ncol = nf)
  Amat[[k]] <- svd(Amat[[k]], nv = 0)$u %*% Gmat
  Xmat[[k]] <- tcrossprod(Amat[[k]] %*% diag(Cmat[k,]), Bmat)
  Emat[[k]] <- matrix(rnorm(nk[k]*mydim[2]), nrow = nk[k], ncol = mydim[2])
}
Emat <- nscale(Emat, 0, ssnew = sumsq(Xmat))   # SNR = 1
X <- mapply("+", Xmat, Emat)

# fit Parafac2 model (unconstrained)
pfac <- parafac2(X, nfac = nf, nstart = 1)
pfac

# check solution
Xhat <- fitted(pfac)
sse <- sumsq(mapply("-", Xmat, Xhat))
sse / (sum(nk) * mydim[2])
crossprod(pfac$A[[1]])
crossprod(pfac$A[[2]])
pfac$Phi

# reorder and resign factors
pfac$B[1:4,]
pfac <- reorder(pfac, 2:1)
pfac$B[1:4,]
pfac <- resign(pfac, mode="B")
pfac$B[1:4,]
Xhat <- fitted(pfac)
sse <- sumsq(mapply("-", Xmat, Xhat))
sse / (sum(nk) * mydim[2])

# rescale factors
colSums(pfac$B^2)
colSums(pfac$C^2)
pfac <- rescale(pfac, mode = "C", absorb = "B")
colSums(pfac$B^2)
colSums(pfac$C^2)
Xhat <- fitted(pfac)
sse <- sumsq(mapply("-", Xmat, Xhat))
sse / (sum(nk) * mydim[2])


##########   4-way example   ##########

# create random data list with Parafac2 structure
set.seed(4)
mydim <- c(NA, 10, 20, 5)
nf <- 3
nk <- rep(c(50,100,200), length.out = mydim[4])
Gmat <- matrix(rnorm(nf^2), nrow = nf, ncol = nf)
Bmat <- scale(matrix(rnorm(mydim[2]*nf), nrow = mydim[2], ncol = nf), center = FALSE)
cseq <- seq(-3, 3, length=mydim[3])
Cmat <- cbind(pnorm(cseq), pgamma(cseq+3.1, shape=1, rate=1)*(3/4), pt(cseq-2, df=4)*2)
Dmat <- scale(matrix(runif(mydim[4]*nf)*2, nrow = mydim[4], ncol = nf), center = FALSE)
Xmat <- Emat <- Amat <- vector("list",mydim[4])
for(k in 1:mydim[4]){
  aseq <- seq(-3, 3, length.out = nk[k])
  Amat[[k]] <- cbind(sin(aseq), sin(abs(aseq)), exp(-aseq^2))
  Amat[[k]] <- svd(Amat[[k]], nv = 0)$u %*% Gmat
  Xmat[[k]] <- array(tcrossprod(Amat[[k]] %*% diag(Dmat[k,]),
                                krprod(Cmat, Bmat)), dim = c(nk[k], mydim[2], mydim[3]))
  Emat[[k]] <- array(rnorm(nk[k] * mydim[2] * mydim[3]), dim = c(nk[k], mydim[2], mydim[3]))
}
Emat <- nscale(Emat, 0, ssnew = sumsq(Xmat))   # SNR = 1
X <- mapply("+", Xmat, Emat)

# fit Parafac model (smooth A, unconstrained B, monotonic C, non-negative D)
pfac <- parafac2(X, nfac = nf, nstart = 1, 
                 const = c("smooth", "uncons", "moninc", "nonneg"))
pfac

# check solution
Xhat <- fitted(pfac)
sse <- sumsq(mapply("-", Xmat, Xhat))
sse / (sum(nk) * mydim[2] * mydim[3])
crossprod(pfac$A[[1]])
crossprod(pfac$A[[2]])
pfac$Phi


## Not run: 

##########   parallel computation   ##########

# create random data list with Parafac2 structure
set.seed(3)
mydim <- c(NA, 10, 20)
nf <- 2
nk <- rep(c(50, 100, 200), length.out = mydim[3])
Gmat <- matrix(rnorm(nf^2), nrow = nf, ncol = nf)
Bmat <- matrix(runif(mydim[2]*nf), nrow = mydim[2], ncol = nf)
Cmat <- matrix(runif(mydim[3]*nf), nrow = mydim[3], ncol = nf)
Xmat <- Emat <- Hmat <- vector("list", mydim[3])
for(k in 1:mydim[3]){
  Hmat[[k]] <- svd(matrix(rnorm(nk[k] * nf), nrow = nk[k], ncol = nf), nv = 0)$u
  Xmat[[k]] <- tcrossprod(Hmat[[k]] %*% Gmat %*% diag(Cmat[k,]), Bmat)
  Emat[[k]] <- matrix(rnorm(nk[k] * mydim[2]), nrow = nk[k], mydim[2])
}
Emat <- nscale(Emat, 0, ssnew = sumsq(Xmat))   # SNR = 1
X <- mapply("+", Xmat, Emat)

# fit Parafac2 model (10 random starts -- sequential computation)
set.seed(1)
system.time({pfac <- parafac2(X, nfac = nf)})
pfac

# fit Parafac2 model (10 random starts -- parallel computation)
cl <- makeCluster(detectCores())
ce <- clusterEvalQ(cl, library(multiway))
clusterSetRNGStream(cl, 1)
system.time({pfac <- parafac2(X, nfac = nf, parallel = TRUE, cl = cl)})
pfac
stopCluster(cl)

## End(Not run)

multiway documentation built on April 15, 2025, 5:08 p.m.