# path_coeff: Path coefficients with minimal multicollinearity In metan: Multi Environment Trials Analysis

## Description

• path_coeff() computes a path analysis using a data frame as input data.

• path_coeff_mat() computes a path analysis using correlation matrices as input data.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 path_coeff( .data, resp, pred = everything(), by = NULL, exclude = FALSE, correction = NULL, knumber = 50, brutstep = FALSE, maxvif = 10, missingval = "pairwise.complete.obs", plot_res = FALSE, verbose = TRUE, ... ) path_coeff_mat(cor_mat, resp, correction = NULL, knumber = 50, verbose = TRUE) 

## Arguments

 .data The data. Must be a data frame or a grouped data passed from dplyr::group_by() resp The dependent variable. pred The predictor variables, set to everything(), i.e., the predictor variables are all the numeric variables in the data except that in resp. by One variable (factor) to compute the function by. It is a shortcut to dplyr::group_by(). To compute the statistics by more than one grouping variable use that function. exclude Logical argument, set to false. If exclude = TRUE, then the variables in pred are deleted from the data, and the analysis will use as predictor those that remained, except that in resp. correction Set to NULL. A correction value (k) that will be added into the diagonal elements of the X'X matrix aiming at reducing the harmful problems of the multicollinearity in path analysis (Olivoto et al., 2017) knumber When correction = NULL, a plot showing the values of direct effects in a set of different k values (0-1) is produced. knumber is the number of k values used in the range of 0 to 1. brutstep Logical argument, set to FALSE. If true, then an algorithm will select a subset of variables with minimal multicollinearity and fit a set of possible models. See the Details section for more information. maxvif The maximum value for the Variance Inflation Factor (cut point) that will be accepted. See the Details section for more information. missingval How to deal with missing values. For more information, please see stats::cor(). plot_res If TRUE, create a scatter plot of residual against predicted value and a normal Q-Q plot. verbose If verbose = TRUE then some results are shown in the console. ... Additional arguments passed on to stats::plot.lm() cor_mat Matrix of correlations containing both dependent and independent traits.

## Details

In path_coeff(), when brutstep = TRUE, an algorithm to select a set of predictors with minimal multicollinearity and high explanatory power is implemented. first, the algorithm will select a set of predictors with minimal multicollinearity. The selection is based on the variance inflation factor (VIF). An iterative process is performed until the maximum VIF observed is less than maxvif. The variables selected in this iterative process are then used in a series of stepwise-based regressions. The first model is fitted and p-1 predictor variables are retained (p is the number of variables selected in the iterative process. The second model adjusts a regression considering p-2 selected variables, and so on until the last model, which considers only two variables. Three objects are created. Summary, with the process summary, Models, containing the aforementioned values for all the adjusted models; and Selectedpred, a vector with the name of the selected variables in the iterative process.

## Value

An object of class path_coeff, group_path, or brute_path with the following items:

• Corr.x A correlation matrix between the predictor variables.

• Corr.y A vector of correlations between each predictor variable with the dependent variable.

• Coefficients The path coefficients. Direct effects are the diagonal elements, and the indirect effects those in the off-diagonal elements (lines).

• Eigen Eigenvectors and eigenvalues of the Corr.x.

• VIF The Variance Inflation Factors.

• plot A ggplot2-based graphic showing the direct effects in 21 different k values.

• Predictors The predictor variables used in the model.

• CN The Condition Number, i.e., the ratio between the highest and lowest eigenvalue.

• Det The matrix determinant of the Corr.x..

• R2 The coefficient of determination of the model.

• Residual The residual effect of the model.

• Response The response variable.

• weightvar The order of the predictor variables with the highest weight (highest eigenvector) in the lowest eigenvalue.

If .data is a grouped data passed from dplyr::group_by() then the results will be returned into a list-column of data frames, containing:

## Author(s)

Tiago Olivoto tiagoolivoto@gmail.com

## References

Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi: 10.2134/agronj2016.04.0196

## Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 library(metan) # Using KW as the response variable and all other ones as predictors pcoeff <- path_coeff(data_ge2, resp = KW) # The same as above, but using the correlation matrix cor_mat <- cor(data_ge2 %>% select_numeric_cols()) pcoeff <- path_coeff_mat(cor_mat, resp = KW) # Declaring the predictors # Create a residual plot with 'plot_res = TRUE' pcoeff2 <- path_coeff(data_ge2, resp = KW, pred = c(PH, EH, NKE, TKW), plot_res = TRUE) # Selecting variables to be excluded from the analysis pcoeff3 <- path_coeff(data_ge2, resp = KW, pred = c(NKR, PERK, KW, NKE), exclude = TRUE) # Selecting a set of predictors with minimal multicollinearity # Maximum variance Inflation factor of 5 pcoeff4 <- path_coeff(data_ge2, resp = KW, brutstep = TRUE, maxvif = 5) # When one analysis should be carried out for each environment # Using the forward-pipe operator %>% pcoeff5 <- path_coeff(data_ge2, resp = KW, by = ENV) 

metan documentation built on Nov. 10, 2021, 9:11 a.m.