Matrix-based Partial Least Squares estimation

Description

matrixpls calculates composite variable models using partial least squares (PLS) algorithm and related methods. In contrast to most other PLS software which implement the raw data version of the algorithm, matrixpls works with data covariance matrices. The algorithms are designed to be computationally efficient, modular in programming, and well documented. matrixpls integrates with simsem to enable Monte Carlo simulations with as little custom programming as possible.

Details

matrixpls calculates models where sets of indicator variables are combined as weighted composites. These composites are then used to estimate a statistical model describing the relationships between the composites and composites and indicators. While a number of such methods exists, the partial least squares (PLS) technique is perhaps the most widely used.

The matrixpls package implements a collection of PLS techniques as well as the more recent GSCA and PLSc techniques and older methods based on analysis with composite variables, such as regression with unit weighted composites or factor scores. The package provides a unified framework that enables the comparison and analysis of these algorithms. In contrast to previous R packages for PLS, such as plspm and semPLS and all currently available commercial PLS software, which work with raw data, matrixpls calculates the indicator weights and model estimates from data covariance matrices. Working with covariance data allows for reanalyzing covariance matrices that are sometimes published as appendices of articles, is computationally more efficient, and lends itself more easily for formal analysis than implementations based on raw data.

matrixpls has modular design that is easily expanded and contains more calculation options than the two other PLS packages for R. To allow validation of the algorithms by end users and to help porting existing analysis files from the two other R packages to matrixpls, the package contains compatibility functions for both plspm and semPLS.

The desing principles and functionality of the package is best explained by first explaining the main function matrixpls. The function performs two tasks. It first calculates a set of indicator weights to form composites based on data covariance matrix and then estimates a statistical model with the indicators and composites using the weights. The main function takes the following arguments:

1
2
3
4
5
matrixpls(S, model, W.model = NULL, 
          weightFun = weightFun.pls, 
          parameterEstim = parameterEstim.separate,
          weightSign = NULL, ..., 
          validateInput = TRUE, standardize = TRUE)

The first five arguments of matrixpls are most relevant for understanding how the package works. S, is the data covariance or correlation matrix. model defines the model which is estimated in the second stage and W.model defines how the indicators are to be aggregated as composites. If W.model is left undefined, it will be constructed based on model following rules that are explained elsewhere in the documentation. weightFun and parameterEstim are functions that implement the first and second task of the function respectively. All other arguments are passed down to these two functions, which in turn can pass arguments to other functions that they call.

Many of the commonly used arguments of matrixpls function are functions themselves. For example, executing a PLS analysis with Mode B outer estimation for all indicator blocks and centroid inner estimation could be specified as follows:

1
2
3

The arguments outerEstim and innerEstim are not defined by the matrixpls function, but are passed down to weightFun.pls which is used as the default weightFun. outerEstim.modeB and innerEstim.centroid are themselves functions provided by the matrixpls package, which perform the actual inner and outer estimation stages of the PLS algorithm. Essentially, all parts of the estimation algorithm can be provided as arguments for the main function. This allows for adjusting the inner workings of the algorithm in a way that is currently not possible with any other PLS software.

It is also possible to define custom functions. For example, we could define a new Mode B outer estimator that only produces positive weights by creating a custom function:

1
2
3
4
5
6
7
myModeB <- function(...){
  abs(outerEstim.ModeB(...))
}

matrixpls(S, model,
          outerEstim = myModeB,
          innerEstim = innerEstim.centroid)

Model can be specified in the lavaan format or the native matrixpls format. The native model format is a list of three binary matrices, inner, reflective, and formative specifying the free parameters of a model: inner (l x l) specifies the regressions between composites, reflective (k x l) specifies the regressions of observed data on composites, and formative (l x k) specifies the regressions of composites on the observed data. Here k is the number of observed variables and l is the number of composites.

If the model is specified in lavaan format, the native format model is derived from this model by assigning all regressions between latent variables to inner, all factor loadings to reflective, and all regressions of latent variables on observed variables to formative. Regressions between observed variables and all free covariances are ignored. All parameters that are specified in the model will be treated as free parameters.

The original papers about Partial Least Squares, as well as many of the current PLS implementations, impose restrictions on the matrices inner, reflective, and formative: inner must be a lower triangular matrix, reflective must have exactly one non-zero value on each row and must have at least one non-zero value on each column, and formative must only contain zeros. Some PLS implementations allow formative to contain non-zero values, but impose a restriction that the sum of reflective and t(formative) must satisfy the original restrictions of reflective. The only restrictions that matrixpls imposes on inner, reflective, and formative is that these must be binary matrices and that the diagonal of inner must be zeros.

References

Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1–36. Retrieved from http://www.jstatsoft.org/v48/i02

Lohmöller J.-B. (1989) Latent variable path modeling with partial least squares. Heidelberg: Physica-Verlag.

Rönkkö, M., McIntosh, C. N., & Antonakis, J. (2015). On the adoption of partial least squares in psychological research: Caveat emptor. Personality and Individual Differences, (87), 76–84. DOI:10.1016/j.paid.2015.07.019

Wold, H. (1982). Soft modeling - The Basic Design And Some Extensions. In K. G. Jöreskog & S. Wold (Eds.),Systems under indirect observation: causality, structure, prediction (pp. 1–54). Amsterdam: North-Holland.

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.