knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

plsdof

Degrees of Freedom and Statistical Inference for Partial Least Squares Regression

Maintainer: Frédéric Bertrand

Lifecycle: stable Project Status: Active – The project has reached a stable, usable state and is being actively developed. R-CMD-check Codecov test coverage CRAN status CRAN RStudio mirror downloads GitHub Repo stars DOI

The plsdof package provides Degrees of Freedom estimates for Partial Least Squares (PLS) Regression. Model selection for PLS is based on various information criteria (aic, bic, gmdl) or on cross-validation. Estimates for the mean and covariance of the PLS regression coefficients are available. They allow the construction of approximate confidence intervals and the application of test procedures. Further, cross-validation procedures for Ridge Regression and Principal Components Regression are available.

The plsdof package was fully coded and developped by Nicole Kraemer and Mikio L. Braun. It is mainly based on the article by N. Kraemer, M. Sugiyama (2012): "The Degrees of Freedom of Partial Least Squares Regression", Journal of the American Statistical Association, 106(494):697-705, doi:10.1198/jasa.2011.tm10107.

Yet due to the regular updates in CRAN policies, it was removed from the CRAN and orphaned since the former maintainer had stopped updating the package. The plsdof package is required by several packages of Frédéric Bertrand who was then selected as the new maintainer since late 2018.

This website and these examples were created by F. Bertrand.

Installation

You can install the released version of plsdof from CRAN with:

install.packages("plsdof")

You can install the development version of plsdof from github with:

devtools::install_github("fbertran/plsdof")

Example

PLS model example.

The pls.model function computes the Partial Least Squares fit.

n<-50 # number of observations
p<-15 # number of variables
X<-matrix(rnorm(n*p),ncol=p)
y<-rnorm(n)

ntest<-200 #
Xtest<-matrix(rnorm(ntest*p),ncol=p) # test data
ytest<-rnorm(ntest) # test data

library(plsdof)
# compute PLS + degrees of freedom + prediction on Xtest
first.object<-pls.model(X,y,compute.DoF=TRUE,Xtest=Xtest,ytest=NULL)

# compute PLS + test error
second.object=pls.model(X,y,m=10,Xtest=Xtest,ytest=ytest)

Model selection for Partial Least Squares based on information criteria

The pls.ic function computes the optimal model parameters using one of three different model selection criteria (aic, bic, gmdl) and based on two different Degrees of Freedom estimates for PLS.

n<-50 # number of observations
p<-5 # number of variables
X<-matrix(rnorm(n*p),ncol=p)
y<-rnorm(n)

# compute linear PLS
pls.object<-pls.ic(X,y,m=ncol(X))

Boston Housing data

Creating response vector and predictors' matrix

data(Boston)
X<-as.matrix(Boston[,-14])
y<-as.vector(Boston[,14])

Compute PLS coefficients for the first 5 components.

my.pls1<-pls.model(X,y,m=5,compute.DoF=TRUE)
my.pls1

Plot Degrees of Freedom and add naive estimate.

plot(0:5,my.pls1$DoF,pch="*",cex=3,xlab="components",ylab="DoF",ylim=c(0,14))
lines(0:5,1:6,lwd=3)

Model selection with the Bayesian Information criterion

my.pls2<-pls.ic(X,y,criterion="bic")
my.pls2

Model selection based on cross-validation.

my.pls3<-pls.cv(X,y,compute.covariance=TRUE)
my.pls3

Returns the estimated covariance matrix of the regression coefficients

my.vcov<-vcov(my.pls3)
my.vcov

Standard deviation of the regression coefficients

my.sd<-sqrt(diag(my.vcov)) 
my.sd


fbertran/plsdof documentation built on Dec. 2, 2022, 11:35 p.m.