classif.gsam.vs: Variable Selection in Functional Data Classification
In fda.usc: Functional Data Analysis and Utilities for Statistical Computing

classif.gsam.vs

R Documentation

Variable Selection in Functional Data Classification

Description

Computes classification by selecting the functional (and non functional) explanatory variables.

Usage

classif.gsam.vs(
  data = list(),
  y,
  x,
  family = binomial(),
  weights = "equal",
  basis.x = NULL,
  basis.b = NULL,
  type = "1vsall",
  prob = 0.5,
  alpha = 0.05,
  dcor.min = 0.01,
  smooth = TRUE,
  measure = "accuracy",
  xydist,
  ...
)

Arguments

`data`	List that containing the variables in the model. "df" element is a `data.frame` with the response and scalar covariates (numeric and factors variables are allowed). Functional covariates of class `fdata` or `fd` are introduced in the following items in the `data` list.
`y`	`caracter` string with the name of the scalar response variable
`x`	`caracter` string vector with the name of the scalar and functional potential covariates.
`family`	a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See `family` for details of family functions.)
`weights`	Weights: if `character` string `='equal'` same weights for each observation (by default) and `='inverse'` for inverse-probability of weighting. if `numeric` vector of length `n`, Weight values of each observation.
`basis.x`	List of basis for functional explanatory data estimation.
`basis.b`	List of basis for functional beta parameter estimation.
`type`	`character`, type of scheme classification. `'1vsall'` (by default) strategy involves training a single classifier per class, with the samples of that class as positive samples and all other samples as negatives. Other posibility for K-way multiclass problem is the `'majority'` voting scheme (also called one vs one). The procedure trains the `K (K - 1) / 2` binary classifiers and predicts the final class label as the class label that has been predicted most frequently.
`prob`	probability value used for binary discriminant.
`alpha`	alpha value to test the null hypothesis for the test of independence among covariate X and residual e. By default is `0.05`.
`dcor.min`	lower threshold for the variable X to be considered. X is discarded if the distance correlation `R(X,e)< dcor.min` (e is the residual).
`smooth`	if `TRUE`, a smooth estimate is made for all covariates included in the model (less for factors). The model is adjusted with the estimated variable linearly or smoothly. If the models are equivalent, the model is adjusted with the linearly estimated variable.
`measure`	measure related with correct classification (by default accuracy).
`xydist`	list with the matrices of distances of each variable (all potential covariates and the response) with itself.
`...`	Further arguments passed to or from other methods.

Value

Return the final fitted model (same result of the classsification method) plus:

dcor, matrix with the values of distance correlation for each pontential covariate (by column) and the residual of the model in each step (by row).
i.predictor, vector with 1 if the variable is selected, 0 otherwise.
ipredictor, vector with the name of selected variables (in order of selection)

Note

Adapted version from the original method in repression: fregre.gsam.vs.

Author(s)

Febrero-Bande, M. and Oviedo de la Fuente, M.

References

Febrero-Bande, M., Gonz\'alez-Manteiga, W. and Oviedo de la Fuente, M. Variable selection in functional additive regression models, (2018). Computational Statistics, 1-19. DOI: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00180-018-0844-5")}

Examples

## Not run: 
data(tecator)
x <- tecator$absorp.fdata
x1 <- fdata.deriv(x)
x2 <- fdata.deriv(x,nderiv=2)
y <- factor(ifelse(tecator$y$Fat<12,0,1))
xcat0 <- cut(rnorm(length(y)),4)
xcat1 <- cut(tecator$y$Protein,4)
xcat2 <- cut(tecator$y$Water,4)
ind <- 1:129
dat    <- data.frame("Fat"=y, x1$data, xcat1, xcat2)
ldat <- ldata("df"=dat[ind,],"x"=x[ind,],"x1"=x1[ind,],"x2"=x2[ind,])
# 3 functionals (x,x1,x2), 3 factors (xcat0, xcat1, xcat2)
# and 100 scalars (impact poitns of x1) 

res.gam <- classif.gsam(Fat~s(x),data=ldat)
summary(res.gam)

# Time consuming
res.gam.vs <- classif.gsam.vs("Fat",data=ldat)
summary(res.gam.vs)
res.gam.vs$i.predictor
res.gam.vs$ipredictor

# Prediction 
newldat <- ldata("df"=dat[-ind,],"x"=x[-ind,],
                "x1"=x1[-ind,],"x2"=x2[-ind,])
pred.gam <- predict(res.gam,newldat)                
pred.gam.vs <- predict(res.gam.vs,newldat)
cat2meas(newldat$df$Fat, pred.gam)
cat2meas(newldat$df$Fat, pred.gam.vs)

## End(Not run)

fda.usc documentation built on April 4, 2025, 4:35 a.m.