STEPR: Stepwise selection of pairwise logratios for generalized...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/STEPR.r

Description

Three different algorithms for selecting pairwise logratios that best explain/predict a response variable, which could be continuous, binary or count

Usage

1
2
3
STEPR(data, y, method = NA, family = "gaussian", nsteps = ncol(data)-1, 
      top = 1, previous = NA, criterion = "Bonferroni", alpha = 0.05, 
      previousparts=NA, denom=NA)

Arguments

data

A data frame or matrix of compositional data on which the pairwise logratios will be constructed and selected

y

The response variable: a numeric variable for regression (default)), a binary factor for logistic regression or a numeric count for Poisson regression

method

The selection method: 1 (unrestricted selection of logratios), 2 (restricted to non-overlapping parts), 3 (additive logratios)

family

The distribution used in the generalized linear model family: "gaussian" (default, for multiple regression), "binomial" (for logistic regression of binary response), or "poisson" (for Poisson regression)

nsteps

The maximum number of steps taken, by default one less than the number of parts

top

When one step is taken (nsteps=1), the ordered list of top logratios with the highest improvements in the likelihood function, for selection based on domain knowledge

previous

For specifying variable(s) to be included before stepwise selection takes place; these can be non-compositional variables and/or specific pairwise logratios computed in previous runs of STEPR or by hand; the matrix (or vector for a single variable) of values must be supplied

criterion

Criterion for stopping the stepwise selection: "Bonferroni" (default), "AIC", "BIC", or NA for no stopping until maximum specified or permissible logratios entered

alpha

Overall significance level (default is 0.05)

previousparts

(For method 2) The sequence numbers of the logratios, if any, forced in using the previous option

denom

(For method 3) The sequence number of the part used in denominator; for use when additive logratios are forced in using previous option or to select a set of additive logratios with specific reference from the start

Details

The function STEPR performs stepwise selection of pairwise logratios with the objective of explaining/predicting a response variable, in the framework of generalized linear modelling where the response can be numeric continuous (regression analysis), or a binary factor (logistic regression), or a numeric count (Poisson regression). The corresponding family option has to be indicated if the regression is logistic or Poisson. The different method options for the stepwise selection are method = 1 (unrestricted selection of logratios, any logratios can be selected irrespective of the previous ones), method = 2 (restricted to non-overlapping parts, each part participates at most in one logratio, so that parts in previously selected logratios are excluded in subsequent steps; logratio effects can be interpreted as under orthogonality), method = 3 (additive logratios, only logratios with the same denominator as the first selected logratio are c onsidered; the result is an additive logratio transformation on a subcomposition) Three alternative stopping criteria can be specified, otherwise the procedure executes as many steps as the value of nsteps. These are (in increasing strictness), "AIC", "BIC" and "Bonferroni" (the default).

Value

rationames

Names of the selected logratios

ratios

The sequence numbers of the selected parts in each ratio

logratios

Matrix of selected logratios

logLik

The -2*log-likelihood sequence for the steps

deviance

The deviance sequence for the steps

AIC

The AIC sequence for the steps

BIC

The BIC sequence for the steps

Bonferroni

The Bonferroni sequence for the steps

null.deviance

The null deviance for the regression

(Notice that for logLik, AIC, BIC and Bonferroni, the values for one more step are given, so that the stopping point can be confirmed.)

And the following if top > 1:

ratios.top

The top ratios and the sequence numbers of their parts

logratios.top

The matrix of top logratios

logLik.top

The set of top -2*log-likelihoods

deviance.top

The set of top deviances

AIC.top

The set of top AICs

BIC.top

The set of top BICs

Bonferroni.top

The set of top Bonferronis

Author(s)

Michael Greenacre

References

Coenders, G. and Greenacre, M. (2021), Three approaches to supervised learning for compositional data with pairwise logratios. aRxiv preprint. URL:https://arxiv.org/abs/2111.08953
Coenders, G. and Pawlowsky-Glahn, V. (2020), On interpretations of tests and effect sizes in regression models with a compositional predictor. SORT, 44:201-220
Greenacre, M. (2021), Compositional data analysis, Annual Review of Statistics and its Application, 8: 271-299

See Also

ALR, STEP, glm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# For the fish morphometric data, first close (normalize, although not necessary) 
# then loop over the 26*25/2 = 325 possible logratios stepwise
data(fish)
habitat <- fish[,2]
morph <- CLOSE(fish[,4:29])
# predict habitat binary classification from morphometric ratios, BIC criterion
fish.step1 <- STEPR(morph, as.factor(habitat), method=1, family="binomial", criterion="BIC")
# [1] "Criterion increases when 5-th ratio enters"
fish.step1$names
# [1] "Bac/Hg" "Hw/Jl"  "Fc/Fdl" "Fal/Ed"
# perform logistic regression with selected logratios
fish.glm   <- glm(as.factor(habitat) ~ fish.step1$logratios, family="binomial")
summary(fish.glm)
fish.pred1  <- predict(fish.glm)
table(fish.pred1>0.5, habitat)
#     habitat
#        1  2
# FALSE 57  4
# TRUE   2 12
# (Thus 69/75 correct predictions)
#
# force the sex variable in at the first step before selecting logratios
# and using more strict Bonferroni default
sex <- as.factor(fish[,1]) 
fish.step2 <- STEPR(morph, as.factor(habitat), method=1, previous=sex, family="binomial")
# [1] "Criterion increases when 3-th ratio enters"
fish.step2$names
# [1] "Bac/Hg" "Hw/Jl"
# perform logistic regression with sex and selected logratios
fish.glm   <- glm(as.factor(habitat) ~ sex + fish.step2$logratios, family="binomial")
summary(fish.glm)
# (sex not significant, Bonferroni only admits first two logratios)
#
# check the top 10 ratios at Step 1 to allow domeain knowledge to operate
fish.step3 <- STEPR(morph, as.factor(habitat), method=1, nsteps=1, top=10, family="binomial")
cbind(fish.step3$ratios.top, fish.step3$BIC.top)
#        row col         
# Bac/Hg   8  19 67.93744
# Bp/Hg    7  19 69.87134
# Jl/Hg    6  19 70.31554
# Jw/Bp    5   7 71.53671
# Jw/Jl    5   6 71.57122
# Jw/Bac   5   8 71.69294
# Fc/Hg   10  19 72.38560
# Hw/Bac   1   8 73.25325
# Jw/Fc    5  10 73.48882
# Hw/Bp    1   7 73.55621
# Suppose 5th in list, Jw/Jl (Jaw width/Jaw length), preferred at the first step, so forced in
fish.step4 <- STEPR(morph, as.factor(habitat), method=1, previous=fish.step3$logratios.top[,5], family="binomial")
# [1] "Criterion increases when 2-th ratio enters"
fish.step4$names
# [1] "Bac/Hg"
# So after Jw/Jl forced in only Bac/Hg enters, the best one originally

easyCODA documentation built on Jan. 15, 2022, 3 a.m.