knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(LRCoDa) library(robCompositions) library(dplyr) library(sjmisc) library(robustbase) library(forcats)
Load the dataset "gemas" from the package robCompositions
and recode one level from "" to "Not Specified", because the function LRCoDa
does not allow empty names for the levels.
data("gemas") gemas <- gemas %>% mutate(soilclass = fct_recode(soilclass, "Not specified" = ""))
With the newly released package LRCoDa
we are able to use factor variables for linear regression on raw compositional data. The factor variables however are not allowed as Internal variables of a compositions. Further, as with the function lmCoDaX
from the package robCompositions
we have the choice between the "robust" and "classical" method.
First, we show an example where we apply the robust method to the dataset "gemas" and use the "soilclass" as the factor variable, the trace elements as composition and "sand" as target variable.
X <- dplyr::select(gemas, c(soilclass, Al:Zr)) y <- gemas$sand LR_robust_1 <- LRCoDa(y = y, X = X, external = 'soilclass', method = 'robust', max_refinement_steps = 500) LR_robust_1$ilr
We can access the summary output of the Linear Regression as known from the function lmCoDaX
with \$ilr (for the ilr output) and \$lm (for the lm output). In addition, we can clearly see which is the external factor variable and which are the variables from the composition by the prefix of the variables.
The output of the robust method is further differentiated from the output using lmCoDaX
because we use lmrob
for the robust calculation of the linear regression in LRCoDa
instead of ltsReg
in lmCoDaX
.
We list some scenarios where we show the limitations of using factor variables with the LRCoDa
function.
Only one factor variable can be used for the model. The function automatically detect when two variables with datatype "Factor" and/or "Character" are. We return an error message if more than one factor variable is used.
X <- dplyr::select(gemas, c(soilclass, COUNTRY, Al:Zr)) LRCoDa(y = gemas$sand, X = X, external = c('COUNTRY', 'soilclass'))
When the external factor column is not specified in the function input, but X contains a variable with factors or characters, the function returns an error message that "Variable with datatype character or factor have to be defined as external".
X <- dplyr::select(gemas, c(soilclass, Al:Zr)) LRCoDa(y = gemas$sand, X = X)
The external factor column must be specified exactly as in the help function described. It is not possible to fill in the whole column.
X <- dplyr::select(gemas, c(soilclass, Al:Zr)) LRCoDa(y = gemas$sand, X = X, external = X %>% select(soilclass))
With the newly released package LRCoDa
we are also able to use numerical external variables that are not part of the composition. Here we also have the possibility to choose between the "robust" and "classical" method and can combine them with a factor variable.
We show an example where we combine external variables with a factor variables and of course the composition (naming as internal variables). The function automatically detects which variable is the factor variable from the external variables. We use the "gemas" dataset, where "MeanTemp" the external variable is, "soilclass" the factor variable, the trace elements the composition ("Internal" variables) and "sand" the target variable.
X <- dplyr::select(gemas, c(MeanTemp, soilclass, Al:Zr)) y <- gemas$sand LR_robust_2 <- LRCoDa(y = y, X = X, external = c('MeanTemp', 'soilclass'), method = 'robust', max_refinement_steps = 500) LR_robust_2$ilr
We can see here that the different types of variables have different prefixes with which we can easily identify them.
We list the possibilities and limitations of using the external variables
We allow and support the use of multiple numerical external variables in the same model.
X <- dplyr::select(gemas, c(MeanTemp, AnnPrec, Al:Zr)) LRCoDa(y = gemas$sand, X = X, external = c('MeanTemp', 'AnnPrec'), method = "classical")
The external column must be specified exactly as in the help function described. It is not possible to fill in the whole column.
X <- dplyr::select(gemas, c(MeanTemp, AnnPrec, Al:Zr)) LRCoDa(y = gemas$sand, X = X, external = X %>% select(c('MeanTemp', 'AnnPrec')))
With the newly introduced use of factor variables for linear regression on raw compositional data, we need to change the fitting algorithm from the old ltsReg
(used in lmCoDaX
) to lmrob
(used in LRCoDa
). This change offers many advantages when working with factor variables and also with external variables compared to ltsReg
which is very limited for this application.
Also, lmrob
determines robustness weights between 0 and 1, while ltsReg
only works with 0 and 1 for the robustness weigths. However, there are also new challenges introduced with lmrob
, as shown in the example below. Also, the backwards compatibility of the function when fitting with the robust method is no longer given due to this change.
X <- dplyr::select(gemas, c(MeanTemp, AnnPrec, soilclass, Al:Zr)) LRCoDa(y = gemas$sand, X = X, external = c('MeanTemp', 'AnnPrec', 'soilclass'), max_refinement_steps = 200)
The algorithm does not converge with the default refinement steps 200 as defined in the function lmrob
. Therefore, we implemented the parameter max_refinement_steps as input to the function LRCoDa
to change the maximum steps in such cases.
X <- dplyr::select(gemas, c(MeanTemp, AnnPrec, soilclass, Al:Zr)) LRCoDa(y = gemas$sand, X = X, external = c('MeanTemp', 'AnnPrec', 'soilclass'), max_refinement_steps = 500)
And now with max_refinement_steps set to 500 the algorithm converges.
The norm of the pivot coordinate calculation can now also be defined in the function input, in contrast to the function lmCoDaX
. The default value is "orthonorm" as it is preset in the function pivotCoord
and therefore also the one used in lmCoDaX
function.
X <- dplyr::select(gemas, c(Al:Zr)) LRCoDa(y = gemas$sand, X = X, pivot_norm = "orthogonal", method = "classical")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.