Individual Differences Scaling

Description

Given a 3-way array X = array(x,dim=c(J,J,K)) with X[,,k] denoting the k-th subject's dissimilarity matrix rating J objects, the INDSCAL model can be written as

Z[i,j,k] = sum B[i,r]*B[j,r]*C[k,r] + E[i,j,k]

where Z is the array of scalar products obtained from X, B = matrix(b,J,R) are the object weights, C = matrix(c,K,R) are the non-negative subject weights, and E = array(e,dim=c(J,J,K)) is the 3-way residual array. The summation is for r = seq(1,R).

Weight matrices are estimated using an alternating least squares algorithm with optional constraints.

Usage

1
2
3
4
indscal(X,nfac,nstart=10,const=NULL,maxit=500,
        type=c("dissimilarity","similarity"),
        ctol=10^-4,parallel=FALSE,cl=NULL,
        output=c("best","all"))

Arguments

X

Three-way data array with dim=c(J,J,K) where X[,,k] is dissimilarity matrix. Can also input a list of (dis)similarity matrices or objects output by dist.

nfac

Number of factors.

nstart

Number of random starts.

const

Constraints for Modes B and C. See Note.

maxit

Maximum number of iterations.

type

Character indicating if X contains dissimilarity data (default) or similarity data.

ctol

Convergence tolerance.

parallel

Logical indicating if parLapply should be used. See Examples.

cl

Cluster created by makeCluster. Only used when parallel=TRUE.

output

Output the best solution (default) or output all nstart solutions.

Value

If output="best", returns an object of class "indscal" with the following elements:

B

Mode B weight matrix.

C

Mode C weight matrix.

Rsq

R-squared value.

iter

Number of iterations.

cflag

Convergence flag.

const

Same as input.

strain

MDS loss function.

Otherwise returns a list of length nstart where each element is an object of class "indscal".

Warnings

The ALS algorithm can perform poorly if the number of factors nfac is set too large.

Default is unconstrained ALS update, which may produce negative (invalid) Mode C weights. Use const=c(0,2) to force non-negativity via fnnls.

Note

Default use is 10 random strarts (nstart=10) with 500 maximum iterations of the ALS algorithm for each start (maxit=500) using a convergence tolerance of 10^-4 (ctol=10^-4). The algorithm is determined to have converged once the change in R^2 is less than or equal to ctol.

Input const should be a two element integer vector giving constraints for Modes B and C. There are four possible options:

const=c(0,0) Unconstrained update for Modes B and C
const=c(0,2) Unconstrained Mode B with non-negative Mode C
const=c(1,0) Orthogonal Mode B with unconstrained Mode C
const=c(1,2) Orthogonal Mode B with non-negative Mode C

Default is unconstrained update for all modes, i.e., const=c(0,0).

Output cflag gives convergence information: cflag=0 if ALS algorithm converged normally, cflag=1 if maximum iteration limit was reached before convergence, and cflag=2 if ALS algorithm terminated abnormally due to problem with non-negativity constraints.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Bro, R., & De Jong, S. (1997). A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics, 11, 393-401.

Carroll, J. D., & Chang, J-J. (1970). Analysis of individual differences in multidimensional scaling via an n-way generalization of "Eckart-Young" decmoposition. Psychometrika, 35, 283-319.

Harshman, R. A. (1970). Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multimodal factor analysis. UCLA Working Papers in Phonetics, 16, 1-84.

Harshman, R. A., & Lundy, M. E. (1994). PARAFAC: Parallel factor analysis. Computational Statistics and Data Analysis, 18, 39-72.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
##########   array example   ##########

# create random data array with INDSCAL structure
set.seed(3)
mydim <- c(50,5,10)
nf <- 2
X <- array(0,dim=c(rep(mydim[2],2),mydim[3]))
for(k in 1:mydim[3]) {
  X[,,k] <- as.matrix(dist(t(matrix(rnorm(prod(mydim[1:2])),mydim[1],mydim[2]))))
}

# fit INDSCAL model (unconstrained)
imod <- indscal(X,nfac=nf,nstart=1)
imod$Rsq

# check solution
Xhat <- fitted(imod)
sum((array(apply(X,3,ed2sp),dim=dim(X))-Xhat)^2)
imod$strain

# reorder and resign factors
imod$B[1:4,]
imod <- reorder(imod, 2:1)
imod$B[1:4,]
imod <- resign(imod, newsign=c(1,-1))
imod$B[1:4,]
sum((array(apply(X,3,ed2sp),dim=dim(X))-Xhat)^2)
imod$strain

# rescale factors
colSums(imod$B^2)
colSums(imod$C^2)
imod <- rescale(imod, mode="C")
colSums(imod$B^2)
colSums(imod$C^2)
sum((array(apply(X,3,ed2sp),dim=dim(X))-Xhat)^2)
imod$strain


##########   list example   ##########

# create random data array with INDSCAL structure
set.seed(4)
mydim <- c(100,8,20)
nf <- 3
X <- vector("list",mydim[3])
for(k in 1:mydim[3]) {
  X[[k]] <- dist(t(matrix(rnorm(prod(mydim[1:2])),mydim[1],mydim[2])))
}

# fit INDSCAL model (orthogonal B, non-negative C)
imod <- indscal(X,nfac=nf,nstart=1,const=c(1,2))
imod$Rsq

# check solution
Xhat <- fitted(imod)
sum((array(unlist(lapply(X,ed2sp)),dim=mydim[c(2,2,3)])-Xhat)^2)
imod$strain
crossprod(imod$B)


## Not run: 

##########   parallel computation   ##########

# create random data array with INDSCAL structure
set.seed(3)
mydim <- c(50,5,10)
nf <- 2
X <- array(0,dim=c(rep(mydim[2],2),mydim[3]))
for(k in 1:mydim[3]) {
  X[,,k] <- as.matrix(dist(t(matrix(rnorm(prod(mydim[1:2])),mydim[1],mydim[2]))))
}

# fit INDSCAL model (10 random starts -- sequential computation)
set.seed(1)
system.time({imod <- indscal(X,nfac=nf)})
imod$Rsq

# fit INDSCAL model (10 random starts -- parallel computation)
set.seed(1)
cl <- makeCluster(detectCores())
ce <- clusterEvalQ(cl,library(multiway))
system.time({imod <- indscal(X,nfac=nf,parallel=TRUE,cl=cl)})
imod$Rsq
stopCluster(cl)

## End(Not run)