# univariateRankVariables: Univariate analysis of features In FRESA.CAD: Feature Selection Algorithms for Computer Aided Diagnosis

## Description

This function reports the mean and standard deviation for each feature in a model, and ranks them according to a user-specified score. Additionally, it does a Kolmogorov-Smirnov (KS) test on the raw and z-standardized data. It also reports the raw and z-standardized t-test score, the p-value of the Wilcoxon rank-sum test, the integrated discrimination improvement (IDI), the net reclassification improvement (NRI), the net residual improvement (NeRI), and the area under the ROC curve (AUC). Furthermore, it reports the z-value of the variable significance on the fitted model.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30``` ``` univariateRankVariables(variableList, formula, Outcome, data, categorizationType = c("Raw", "Categorical", "ZCategorical", "RawZCategorical", "RawTail", "RawZTail", "Tail", "RawRaw"), type = c("LOGIT", "LM", "COX"), rankingTest = c("zIDI", "zNRI", "IDI", "NRI", "NeRI", "Ztest", "AUC", "CStat", "Kendall"), cateGroups = c(0.1, 0.9), raw.dataFrame = NULL, description = ".", uniType = c("Binary","Regression"), FullAnalysis=TRUE, acovariates = NULL, timeOutcome = NULL ) ```

## Arguments

 `variableList` A data frame with the candidate variables to be ranked `formula` An object of class `formula` with the formula to be fitted `Outcome` The name of the column in `data` that stores the variable to be predicted by the model `data` A data frame where all variables are stored in different columns `categorizationType` How variables will be analyzed: As given in `data` ("Raw"); broken into the p-value categories given by `cateGroups` ("Categorical"); broken into the p-value categories given by `cateGroups`, and weighted by the z-score ("ZCategorical"); broken into the p-value categories given by `cateGroups`, weighted by the z-score, plus the raw values ("RawZCategorical"); raw values, plus the tails ("RawTail"); or raw values, weighted by the z-score, plus the tails ("RawZTail") `type` Fit type: Logistic ("LOGIT"), linear ("LM"), or Cox proportional hazards ("COX") `rankingTest` Variables will be ranked based on: The z-score of the IDI ("zIDI"), the z-score of the NRI ("zNRI"), the IDI ("IDI"), the NRI ("NRI"), the NeRI ("NeRI"), the z-score of the model fit ("Ztest"), the AUC ("AUC"), the Somers' rank correlation ("Cstat"), or the Kendall rank correlation ("Kendall") `cateGroups` A vector of percentiles to be used for the categorization procedure `raw.dataFrame` A data frame similar to `data`, but with unadjusted data, used to get the means and variances of the unadjusted data `description` The name of the column in `variableList` that stores the variable description `uniType` Type of univariate analysis: Binary classification ("Binary") or regression ("Regression") `FullAnalysis` If FALSE it will only order the features according to its z-statistics of the linear model `acovariates` the list of covariates `timeOutcome` the name of the Time to event feature

## Details

This function will create valid dummy categorical variables if, and only if, `data` has been z-standardized. The p-values provided in `cateGroups` will be converted to its corresponding z-score, which will then be used to create the categories. If non z-standardized data were to be used, the categorization analysis would return wrong results.

## Value

A sorted data frame. In the case of a binary classification analysis, the data frame will have the following columns:

 `Name` Name of the raw variable or of the dummy variable if the data has been categorized `parent` Name of the raw variable from which the dummy variable was created `descrip` Description of the parent variable, as defined in `description` `cohortMean` Mean value of the variable `cohortStd` Standard deviation of the variable `cohortKSD` D statistic of the KS test when comparing a normal distribution and the distribution of the variable `cohortKSP` Associated p-value to the `cohortKSD` `caseMean` Mean value of cases (subjects with `Outcome` equal to 1) `caseStd` Standard deviation of cases `caseKSD` D statistic of the KS test when comparing a normal distribution and the distribution of the variable only for cases `caseKSP` Associated p-value to the `caseKSD` `caseZKSD` D statistic of the KS test when comparing a normal distribution and the distribution of the z-standardized variable only for cases `caseZKSP` Associated p-value to the `caseZKSD` `controlMean` Mean value of controls (subjects with `Outcome` equal to 0) `controlStd` Standard deviation of controls `controlKSD` D statistic of the KS test when comparing a normal distribution and the distribution of the variable only for controls `controlKSP` Associated p-value to the `controlsKSD` `controlZKSD` D statistic of the KS test when comparing a normal distribution and the distribution of the z-standardized variable only for controls `controlZKSP` Associated p-value to the `controlsZKSD` `t.Rawvalue` Normal inverse p-value (z-value) of the t-test performed on `raw.dataFrame` `t.Zvalue` z-value of the t-test performed on `data` `wilcox.Zvalue` z-value of the Wilcoxon rank-sum test performed on `data` `ZGLM` z-value returned by the `lm`, `glm`, or `coxph` functions for the `z`-standardized variable `zNRI` z-value returned by the `improveProb` function (`Hmisc` package) when evaluating the NRI `zIDI` z-value returned by the `improveProb` function (`Hmisc` package) when evaluating the IDI `zNeRI` z-value returned by the `improvedResiduals` function when evaluating the NeRI `ROCAUC` Area under the ROC curve returned by the `roc` function (`pROC` package) `cStatCorr` c index of Somers' rank correlation returned by the `rcorr.cens` function (`Hmisc` package) `NRI` NRI returned by the `improveProb` function (`Hmisc` package) `IDI` IDI returned by the `improveProb` function (`Hmisc` package) `NeRI` NeRI returned by the `improvedResiduals` function `kendall.r` Kendall τ rank correlation coefficient between the variable and the binary outcome `kendall.p` Associated p-value to the `kendall.r` `TstudentRes.p` p-value of the improvement in residuals, as evaluated by the paired t-test `WilcoxRes.p` p-value of the improvement in residuals, as evaluated by the paired Wilcoxon rank-sum test `FRes.p` p-value of the improvement in residual variance, as evaluated by the F-test `caseN_Z_Low_Tail` Number of cases in the low tail `caseN_Z_Hi_Tail` Number of cases in the top tail `controlN_Z_Low_Tail` Number of controls in the low tail `controlN_Z_Hi_Tail` Number of controls in the top tail

In the case of regression analysis, the data frame will have the following columns:

 `Name` Name of the raw variable or of the dummy variable if the data has been categorized `parent` Name of the raw variable from which the dummy variable was created `descrip` Description of the parent variable, as defined in `description` `cohortMean` Mean value of the variable `cohortStd` Standard deviation of the variable `cohortKSD` D statistic of the KS test when comparing a normal distribution and the distribution of the variable `cohortKSP` Associated p-value to the `cohortKSP` `cohortZKSD` D statistic of the KS test when comparing a normal distribution and the distribution of the z-standardized variable `cohortZKSP` Associated p-value to the `cohortZKSD` `ZGLM` z-value returned by the glm or Cox procedure for the z-standardized variable `zNRI` z-value returned by the `improveProb` function (`Hmisc` package) when evaluating the NRI `NeRI` NeRI returned by the `improvedResiduals` function `cStatCorr` c index of Somers' rank correlation returned by the `rcorr.cens` function (`Hmisc` package) `spearman.r` Spearman ρ rank correlation coefficient between the variable and the outcome `pearson.r` Pearson r product-moment correlation coefficient between the variable and the outcome `kendall.r` Kendall τ rank correlation coefficient between the variable and the outcome `kendall.p` Associated p-value to the `kendall.r` `TstudentRes.p` p-value of the improvement in residuals, as evaluated by the paired t-test `WilcoxRes.p` p-value of the improvement in residuals, as evaluated by the paired Wilcoxon rank-sum test `FRes.p` p-value of the improvement in residual variance, as evaluated by the F-test

## Author(s)

Jose G. Tamez-Pena

## References

Pencina, M. J., D'Agostino, R. B., & Vasan, R. S. (2008). Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Statistics in medicine 27(2), 157-172.

