Description Usage Arguments Details Value Author(s) References See Also Examples

Integration of multiple data sets measured on the same samples or observations to classify a discrete outcome, ie. N-integration with Discriminant Analysis. The method is partly based on Generalised Canonical Correlation Analysis.

1 2 3 4 5 6 7 8 9 10 11 12 13 |

`X` |
A list of data sets (called 'blocks') measured on the same samples. Data in the list should be arranged in matrices, samples x variables, with samples order matching in all data sets. |

`Y` |
A factor or a class vector indicating the discrete outcome of each sample. |

`indY` |
To be supplied if Y is missing, indicates the position of the factor / class vector outcome in the list |

`ncomp` |
the number of components to include in the model. Default to 2. Applies to all blocks. |

`design` |
numeric matrix of size (number of blocks in X) x (number of blocks in X) with values between 0 and 1. Each value indicates the strenght of the relationship to be modelled between two blocks; a value of 0 indicates no relationship, 1 is the maximum value. If |

`scheme` |
Either "horst", "factorial" or "centroid". Default = |

`mode` |
character string. What type of algorithm to use, (partially) matching
one of |

`scale` |
boleean. If scale = TRUE, each block is standardized
to zero means and unit variances. Default = |

`init` |
Mode of initialization use in the algorithm, either by Singular Value Decompostion of the product of each block of X with Y ("svd") or each block independently ("svd.single"). Default = |

`tol` |
Convergence stopping value. |

`max.iter` |
integer, the maximum number of iterations. |

`near.zero.var` |
boolean, see the internal |

`all.outputs` |
boolean. Computation can be faster when some specific (and non-essential) outputs are not calculated. Default = |

`block.plsda`

function fits a horizontal integration PLS-DA model with a specified number of components per block).
A factor indicating the discrete outcome needs to be provided, either by `Y`

or by its position `indY`

in the list of blocks `X`

.

`X`

can contain missing values. Missing values are handled by being disregarded during the cross product computations in the algorithm `block.pls`

without having to delete rows with missing data. Alternatively, missing data can be imputed prior using the `nipals`

function.

The type of algorithm to use is specified with the `mode`

argument. Four PLS
algorithms are available: PLS regression `("regression")`

, PLS canonical analysis
`("canonical")`

, redundancy analysis `("invariant")`

and the classical PLS
algorithm `("classic")`

(see References and `?pls`

for more details).

Note that our method is partly based on Generalised Canonical Correlation Analysis and differs from the MB-PLS approaches proposed by Kowalski et al., 1989, J Chemom 3(1) and Westerhuis et al., 1998, J Chemom, 12(5).

`block.plsda`

returns an object of class `"block.plsda","block.pls"`

, a list
that contains the following components:

`X` |
the centered and standardized original predictor matrix. |

`indY` |
the position of the outcome Y in the output list X. |

`ncomp` |
the number of components included in the model for each block. |

`mode` |
the algorithm used to fit the model. |

`variates` |
list containing the variates of each block of X. |

`loadings` |
list containing the estimated loadings for the variates. |

`names` |
list containing the names to be used for individuals and variables. |

`nzv` |
list containing the zero- or near-zero predictors information. |

`iter` |
Number of iterations of the algorthm for each component |

`explained_variance` |
Percentage of explained variance for each component and each block |

Florian Rohart, Benoit Gautier, Kim-Anh Lê Cao

On PLSDA:

Barker M and Rayens W (2003). Partial least squares for discrimination. *Journal of Chemometrics* **17**(3), 166-173.
Perez-Enciso, M. and Tenenhaus, M. (2003). Prediction of clinical outcome with microarray data:
a partial least squares discriminant analysis (PLS-DA) approach. *Human Genetics*
**112**, 581-592.
Nguyen, D. V. and Rocke, D. M. (2002). Tumor classification by partial
least squares using microarray gene expression data. *Bioinformatics*
**18**, 39-50.

On multiple integration with PLS-DA: Gunther O., Shin H., Ng R. T. , McMaster W. R., McManus B. M. , Keown P. A. , Tebbutt S.J. , Lê Cao K-A. , (2014) Novel multivariate methods for integration of genomics and proteomics data: Applications in a kidney transplant rejection study, OMICS: A journal of integrative biology, 18(11), 682-95.

On multiple integration with sPLS-DA and 4 data blocks:

Singh A., Gautier B., Shannon C., Vacher M., Rohart F., Tebbutt S. and Lê Cao K.A. (2016). DIABLO: multi omics integration for biomarker discovery. BioRxiv available here: http://biorxiv.org/content/early/2016/08/03/067611

mixOmics article:

Rohart F, Gautier B, Singh A, Lê Cao K-A. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752

`plotIndiv`

, `plotArrow`

, `plotLoadings`

, `plotVar`

, `predict`

, `perf`

, `selectVar`

, `block.pls`

, `block.splsda`

and http://www.mixOmics.org for more details.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ```
data(nutrimouse)
data = list(gene = nutrimouse$gene, lipid = nutrimouse$lipid, Y = nutrimouse$diet)
# with this design, all blocks are connected
design = matrix(c(0,1,1,1,0,1,1,1,0), ncol = 3, nrow = 3,
byrow = TRUE, dimnames = list(names(data), names(data)))
res = block.plsda(X = data, indY = 3) # indY indicates where the outcome Y is in the list X
plotIndiv(res, ind.names = FALSE, legend = TRUE)
plotVar(res)
## Not run:
# when Y is provided
res2 = block.plsda(list(gene = nutrimouse$gene, lipid = nutrimouse$lipid),
Y = nutrimouse$diet, ncomp = 2)
plotIndiv(res2)
plotVar(res2)
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.