Vignette 2: GSPCR specification options
In gspcr: Generalized Supervised Principal Component Regression

Here we focus on the specifications of the GSPCR model. Three arguments of the cv_gspcr() should be specified carefully:

Association measure
Fit measure
Number of components

In this vignette we consider a simple scenario with a continuous dependent variable and a set of continuous predictors. First, we load the required packages and store the example dataset GSPCRexdata (see the helpfile for details ?GSPCRexdata) in two separate objects:

# Load R packages
library(gspcr) # this package!
library(superpc) # alternative comparison package
library(patchwork) # combining ggplots

# Comment goal of code
X <- GSPCRexdata$X$cont
y <- GSPCRexdata$y$cont

Association measures

As described in the introduction, gspcr allows for the specification of different bivariate association measures. We can run gspcr using as a threshold type:

the log-likelihoods of simple GLMs;
the generalized $R^2$;
the normalized association measure used in the superpc R package.

Another important aspect to consider is the number of threshold values that should be considered. This can be specified with the nthrs argument. Using the following code we can compare the solution paths obtained by the different association measures and values for a given number of PCs.

# Define a vector of threshold types
threshold_types <- c("LLS", "normalized", "PR2")

# Train the GSPCR model with the different values
out_trhs <- lapply(
    X = threshold_types,
    FUN = function(i) {
        cv_gspcr(
            dv = y,
            ivs = X,
            thrs = i,       # threshold type
            nthrs = 20,     # number of threshold values
            npcs_range = 1, 
            K = 10
        )
    }
)

# Plot them
plots <- lapply(out_trhs, function(i) {
    plot(
        x = i,
        y = "F",
        labels = FALSE,     # We are using a single nPC, do not need the label
        discretize = FALSE, # Makes X-axis more readable
        print = FALSE
    )
})

# Patchwork ggplots
plots[[1]] + plots[[2]] + plots[[3]]

Figure 1: Solution paths for different association measures.

As you can see, the solution paths are similar, although LLS tended to favor lower threshold values.

Fit measures

We can use different cross-validation fit measures. See the help file for the list options (?cv_gspcr).

# Measures
fit_measure_vec <- c("LRT", "PR2", "MSE", "F", "AIC", "BIC")

# Train the GSPCR model with the different values
out_fit_meas <- lapply(fit_measure_vec, function(i) {
    cv_gspcr(
        dv = y,
        ivs = X,
        fit_measure = i,
        thrs = "normalized",
        nthrs = 20,
        npcs_range = 1,
        K = 10
    )
})

# Plot them
plots <- lapply(seq_along(fit_measure_vec), function(i) {
    # Reverse y?
    rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i])

    # Make plots
    plot(
        x = out_fit_meas[[i]],
        y = fit_measure_vec[[i]],
        labels = FALSE,
        y_reverse = rev,
        errorBars = FALSE,
        discretize = FALSE,
        print = FALSE
    )
})

# Patchwork ggplots
(plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])

Figure 2: Solution paths for different fit measures.

As you can see, the different fit measures return equivalent solution paths. This is true for any number of PCs:

# Train the GSPCR model with the different values
out_fit_meas <- lapply(fit_measure_vec, function(i) {
    cv_gspcr(
        dv = y,
        ivs = X,
        fit_measure = i,
        thrs = "normalized",
        nthrs = 20,
        npcs_range = 5,
        K = 10
    )
})

# Plot them
plots <- lapply(seq_along(fit_measure_vec), function(i) {
    # Reverse y?
    rev <- grepl("MSE|AIC|BIC", fit_measure_vec[i])

    # Make plots
    plot(
        x = out_fit_meas[[i]],
        y = fit_measure_vec[[i]],
        labels = FALSE,
        y_reverse = rev,
        errorBars = FALSE,
        discretize = FALSE,
        print = FALSE
    )
})

# Patchwork ggplots
(plots[[1]] + plots[[2]] + plots[[3]]) / (plots[[4]] + plots[[5]] + plots[[6]])

Figure 3: Solution paths for different fit measures when using 5 PCs.

Number of components

We can use cross-validation to select the number of PCs as well. We can use the npcs_range argument to specify the range of the number of PCs to consider.

# Train the model
out_npcs <- cv_gspcr(
    dv = y,
    ivs = X,
    npcs_range = c(2, 5, 10)
)

# Plot solution paths
plot(out_npcs)

Figure 4: Solution paths for different fit measures when cross-validating the number of PCs.

Given the choice of 2, 5, or 10 PCs, we would use 2 PCs with the second threshold value.

Any scripts or data that you put into this service are public.

gspcr documentation built on May 29, 2024, 2:44 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gspcr
Generalized Supervised Principal Component Regression

Vignette 2: GSPCR specification options
In gspcr: Generalized Supervised Principal Component Regression

Association measures

Fit measures

Number of components

Try the gspcr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

gspcr Generalized Supervised Principal Component Regression

Vignette 2: GSPCR specification options In gspcr: Generalized Supervised Principal Component Regression

Association measures

Fit measures

Number of components

Try the gspcr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

gspcr
Generalized Supervised Principal Component Regression

Vignette 2: GSPCR specification options
In gspcr: Generalized Supervised Principal Component Regression