# Matrix-based Partial Least Squares estimation

### Description

matrixpls calculates composite variable models using partial least squares (PLS) algorithm and related methods. In contrast to most other PLS software which implement the raw data version of the algorithm, matrixpls works with data covariance matrices. The algorithms are designed to be computationally efficient, modular in programming, and well documented. matrixpls integrates with simsem to enable Monte Carlo simulations with as little custom programming as possible.

### Details

matrixpls calculates models where sets of indicator variables are combined as weighted composites. These composites are then used to estimate a statistical model describing the relationships between the composites and composites and indicators. While a number of such methods exists, the partial least squares (PLS) technique is perhaps the most widely used.

The matrixpls package implements a collection of PLS techniques as well as the more recent GSCA and PLSc techniques and older methods based on analysis with composite variables, such as regression with unit weighted composites or factor scores. The package provides a unified framework that enables the comparison and analysis of these algorithms. In contrast to previous R packages for PLS, such as plspm and semPLS and all currently available commercial PLS software, which work with raw data, matrixpls calculates the indicator weights and model estimates from data covariance matrices. Working with covariance data allows for reanalyzing covariance matrices that are sometimes published as appendices of articles, is computationally more efficient, and lends itself more easily for formal analysis than implementations based on raw data.

matrixpls has modular design that is easily expanded and contains more calculation options than the two other PLS packages for R. To allow validation of the algorithms by end users and to help porting existing analysis files from the two other R packages to matrixpls, the package contains compatibility functions for both plspm and semPLS.

The desing principles and functionality of the package is best explained by first explaining the main
function `matrixpls`

. The function performs two tasks. It first calculates a set of indicator
weights to form composites based on data covariance matrix and then estimates a statistical model
with the indicators and composites using the weights. The main function takes the following arguments:

1 2 3 4 5 | ```
matrixpls(S, model, W.model = NULL,
weightFun = weightFun.pls,
parameterEstim = parameterEstim.separate,
weightSign = NULL, ...,
validateInput = TRUE, standardize = TRUE)
``` |

The first five arguments of `matrixpls`

are most relevant for understanding how the package
works. `S`

, is the data covariance or correlation matrix. `model`

defines the model
which is estimated in the second stage and `W.model`

defines how the indicators are to be
aggregated as composites. If `W.model`

is left undefined, it will be constructed based on
`model`

following rules that are explained elsewhere in the documentation.
`weightFun`

and
`parameterEstim`

are functions that
implement the first and second task of the function respectively. All other arguments are passed
down to these two functions, which in turn can pass arguments to other functions that they call.

Many of the commonly used arguments of `matrixpls`

function are functions themselves. For
example, executing a PLS analysis with Mode B outer estimation for all indicator blocks and centroid inner
estimation could be specified as follows:

1 2 3 | ```
matrixpls(S, model,
outerEstim = outerEstim.modeB,
innerEstim = innerEstim.centroid)
``` |

The arguments `outerEstim`

and `innerEstim`

are not defined by the
`matrixpls`

function, but are passed down to `weightFun.pls`

which is used as the default
`weightFun`

. `outerEstim.modeB`

and `innerEstim.centroid`

are themselves functions provided
by the matrixpls package, which perform the actual inner and outer estimation stages of the
PLS algorithm. Essentially, all parts of the estimation algorithm can be provided as arguments for
the main function. This allows for adjusting the inner workings of the algorithm in a way that is
currently not possible with any other PLS software.

It is also possible to define custom functions. For example, we could define a new Mode B outer estimator that only produces positive weights by creating a custom function:

1 2 3 4 5 6 7 | ```
myModeB <- function(...){
abs(outerEstim.ModeB(...))
}
matrixpls(S, model,
outerEstim = myModeB,
innerEstim = innerEstim.centroid)
``` |

Model can be specified in the lavaan format or the native matrixpls format.
The native model format is a list of three binary matrices, `inner`

, `reflective`

,
and `formative`

specifying the free parameters of a model: `inner`

(`l x l`

) specifies the
regressions between composites, `reflective`

(`k x l`

) specifies the regressions of observed
data on composites, and `formative`

(`l x k`

) specifies the regressions of composites on the
observed data. Here `k`

is the number of observed variables and `l`

is the number of composites.

If the model is specified in lavaan format, the native
format model is derived from this model by assigning all regressions between latent
variables to `inner`

, all factor loadings to `reflective`

, and all regressions
of latent variables on observed variables to `formative`

. Regressions between
observed variables and all free covariances are ignored. All parameters that are
specified in the model will be treated as free parameters.

The original papers about Partial Least Squares, as well as many of the current PLS
implementations, impose restrictions on the matrices `inner`

,
`reflective`

, and `formative`

: `inner`

must be a lower triangular matrix,
`reflective`

must have exactly one non-zero value on each row and must have at least
one non-zero value on each column, and `formative`

must only contain zeros.
Some PLS implementations allow `formative`

to contain non-zero values, but impose a
restriction that the sum of `reflective`

and `t(formative)`

must satisfy
the original restrictions of `reflective`

. The only restrictions that matrixpls
imposes on `inner`

, `reflective`

, and `formative`

is that these must be
binary matrices and that the diagonal of `inner`

must be zeros.

### References

Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. *Journal of Statistical Software*, 48(2), 1–36. Retrieved from http://www.jstatsoft.org/v48/i02

Lohmöller J.-B. (1989) *Latent variable path modeling with partial least squares.* Heidelberg: Physica-Verlag.

Rönkkö, M., McIntosh, C. N., & Antonakis, J. (2015). On the adoption of partial least squares in psychological research: Caveat emptor. *Personality and Individual Differences*, (87), 76–84. DOI:10.1016/j.paid.2015.07.019

Wold, H. (1982). Soft modeling - The Basic Design And Some Extensions. In K. G. Jöreskog & S. Wold (Eds.),*Systems under indirect observation: causality, structure, prediction* (pp. 1–54). Amsterdam: North-Holland.