ScanComponentsSubset: Iteratively runs GLM fitting on a set number of components to...

View source: R/puddlr.R

ScanComponentsSubsetR Documentation

Iteratively runs GLM fitting on a set number of components to identify the appropriate number of dimensions to restrict the model to. Performs k-fold cross validation to assess for overfitting with averaged root mean squared error, and pseudo-R squared (McFadden's) to assess for goodness of fit. Data for the search are saved in the puddlr object for later plotting and visual inspection

Description

Iteratively runs GLM fitting on a set number of components to identify the appropriate number of dimensions to restrict the model to. Performs k-fold cross validation to assess for overfitting with averaged root mean squared error, and pseudo-R squared (McFadden's) to assess for goodness of fit. Data for the search are saved in the puddlr object for later plotting and visual inspection

Usage

ScanComponentsSubset(
  puddlr,
  n.to.scan.vec,
  k.cross = 7,
  rand.seed = 42,
  formula,
  family,
  reduction,
  adj.rsq
)

Arguments

puddlr

A puddlr object

n.to.scan.vec

Vector of possible values for n.components to scan for GLM fitting.

k.cross

number of train-test splits. (corresponds to k-fold cross validation). Default = 7.

rand.seed

random seed for cross validation split, default=42.

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification can be found under 'glm'. The name of the response variable in the formula will be used to name the response variable.

family

a description of the error distribution and link function to be used in the model. Passed to the argument of the same name under 'glm'.

reduction

a string specifying the linear dimensionality reduction to use. Valid options are 'pca'. Default = 'pca'

adj.rsq

boolean flag specifying whether to adjust the pseudo-R^2 goodness-of-fit calculated value.

Value

puddlr object with scanned n components by rsq plot.


rahuldhodapkar/puddlr documentation built on May 28, 2022, 12:53 p.m.