Here we focus on the specifications of the GSPCR model. Three arguments of the cv_gspcr()
should be specified carefully:
In this vignette we consider a simple scenario with a continuous dependent variable and a set of continuous predictors. First, we load the required packages and store the example dataset GSPCRexdata
(see the helpfile for details ?GSPCRexdata
) in two separate objects:
# Load R packages library(gspcr) # this package! library(superpc) # alternative comparison package library(patchwork) # combining ggplots # Comment goal of code X <- GSPCRexdata$X$cont y <- GSPCRexdata$y$cont
As described in the introduction, gspcr
allows for the specification of different bivariate association measures.
We can run gspcr
using as a threshold type:
superpc
R package.Another important aspect to consider is the number of threshold values that should be considered.
This can be specified with the nthrs
argument.
Using the following code we can compare the solution paths obtained by the different association measures and values for a given number of PCs.
# Define a vector of threshold types threshold_types <- c("LLS", "normalized", "PR2") # Train the GSPCR model with the different values out_trhs <- lapply( X = threshold_types, FUN = function(i) { cv_gspcr( dv = y, ivs = X, thrs = i, # threshold type nthrs = 20, # number of threshold values npcs_range = 1, K = 10 ) } ) # Plot them plots <- lapply(out_trhs, function(i) { plot( x = i, y = "F", labels = FALSE, # We are using a single nPC, do not need the label discretize = FALSE, # Makes X-axis more readable print = FALSE ) }) # Patchwork ggplots plots[[1]] + plots[[2]] + plots[[3]]
Figure 1: Solution paths for different association measures.
As you can see, the solution paths are similar, although LLS tended to favor lower threshold values.
We can use different cross-validation fit measures.
See the help file for the list options (?cv_gspcr
).
# Measures fit_measure_vec <- c("LRT", "PR2", "MSE", "F", "AIC", "BIC") # Train the GSPCR model with the different values out_fit_meas <- lapply(fit_measure_vec, function(i) { cv_gspcr( dv = y, ivs = X, fit_measure = i, thrs = "normalized", nthrs = 20, npcs_range = 1, K = 10 ) }) # Plot them plots <- lapply(seq_along(fit_measure_vec), function(i) { # Reverse y? rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i]) # Make plots plot( x = out_fit_meas[[i]], y = fit_measure_vec[[i]], labels = FALSE, y_reverse = rev, errorBars = FALSE, discretize = FALSE, print = FALSE ) }) # Patchwork ggplots (plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])
Figure 2: Solution paths for different fit measures.
As you can see, the different fit measures return equivalent solution paths. This is true for any number of PCs:
# Train the GSPCR model with the different values out_fit_meas <- lapply(fit_measure_vec, function(i) { cv_gspcr( dv = y, ivs = X, fit_measure = i, thrs = "normalized", nthrs = 20, npcs_range = 5, K = 10 ) }) # Plot them plots <- lapply(seq_along(fit_measure_vec), function(i) { # Reverse y? rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i]) # Make plots plot( x = out_fit_meas[[i]], y = fit_measure_vec[[i]], labels = FALSE, y_reverse = rev, errorBars = FALSE, discretize = FALSE, print = FALSE ) }) # Patchwork ggplots (plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])
Figure 3: Solution paths for different fit measures when using 5 PCs.
We can use cross-validation to select the number of PCs as well.
We can use the npcs_range
argument to specify the range of the number of PCs to consider.
# Train the model out_npcs <- cv_gspcr( dv = y, ivs = X, npcs_range = c(2, 5, 10) ) # Plot solution paths plot(out_npcs)
Figure 4: Solution paths for different fit measures when cross-validating the number of PCs.
Given the choice of 2, 5, or 10 PCs, we would use 2 PCs with the second threshold value.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.