readme.md

evoxploit

Overview

evoxploit is a package for augmenting a longitudinal dataset* with so called evolution features (Niemann et al., 2015) and sequence features (Hielscher et al., 2014). Theses features reflect a study participant’s individual change over time, his/her change over time in comparison with the cluster he/she was closest to at the baseline examinations, and the change of whole participant groups.

* The functions of this package require a n*m-dimensional dataframe for the input variables and a factor vector of length n with the target variable (e.g. a medical outcome). The input data frame should have the following variables names semantics:

Installation

# Development version from Github
devtools::install_github("unmnn/evoxploit")

Usage Example

# Inspect example data
library(evoxploit)
str(epi)
## List of 2
##  $ data :Classes 'tbl_df', 'tbl' and 'data.frame':   354 obs. of  19 variables:
##   ..$ school_s0     : Factor w/ 3 levels "1","2","3": 2 2 1 3 2 2 3 2 2 3 ...
##   ..$ coffee_s0     : num [1:354] 7 4 7 4 3 3 4 2 3 7 ...
##   ..$ bmi_s0        : num [1:354] 32 22 23 25 27 23 22 21 17 29 ...
##   ..$ bmi_s1        : num [1:354] 30 22 21 25 28 25 22 22 17 29 ...
##   ..$ bmi_s2        : num [1:354] 31 26 23 24 30 25 24 26 17 31 ...
##   ..$ uric_acid_s0  : num [1:354] 330 183 295 241 253 302 178 204 199 305 ...
##   ..$ uric_acid_s1  : num [1:354] 288 199 264 247 276 292 174 207 198 296 ...
##   ..$ uric_acid_s2  : num [1:354] 361 187 335 192 289 261 163 223 191 480 ...
##   ..$ smoke_s0      : Factor w/ 3 levels "1","2","3": 3 1 2 3 1 2 3 2 2 2 ...
##   ..$ smoke_s1      : Factor w/ 3 levels "1","2","3": 3 1 3 3 1 1 3 2 2 2 ...
##   ..$ smoke_s2      : Factor w/ 3 levels "1","2","3": 3 1 2 3 1 1 3 2 2 2 ...
##   ..$ c07aa_s0      : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ c07aa_s1      : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ c07aa_s2      : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ sa75_s0       : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 3 1 1 1 ...
##   ..$ sa75_s2       : Factor w/ 4 levels "1","2","3","4": 2 1 3 3 1 1 1 1 1 1 ...
##   ..$ gx_11597390_s0: Factor w/ 3 levels "1","2","3": 2 3 1 3 3 2 2 2 2 3 ...
##   ..$ gx_11597390_s1: Factor w/ 3 levels "1","2","3": 2 3 1 3 3 2 2 2 2 3 ...
##   ..$ gx_11597390_s2: Factor w/ 3 levels "1","2","3": 2 3 1 3 3 2 2 2 2 3 ...
##  $ label:Classes 'tbl_df', 'tbl' and 'data.frame':   354 obs. of  1 variable:
##   ..$ label: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 2 1 1 ...
# Create an Evoxploit object
epi_evo <- Evoxploit$new(epi$data, epi$label[[1]], wave_suffix = "_s")

# Print summary to console
summary(epi_evo)
## === Summary ===
## Data:
##   - observations:  354 
##   - columns:  19 
##     - #numeric variables:  7 
##     - #factor variables:  12 
##   - waves:  0, 1, 2 
##  
## Target: 
##   - #classes:  3 
##   - predominant class:  A :  60 % 
##  
## Clustering: 
## Cluster  1 
##   - attributes:  bmi_s0, uric_acid_s0 
##   - minPts:  6 
##   - eps:  0.07289373 
##   - clusters:  3 
##   - predominant cluster:  1 :  95 % 
##   - outlier:  5 % 
## Cluster  2 
##   - attributes:  bmi_s1, uric_acid_s1 
##   - minPts:  6 
##   - eps:  0.1415658 
##   - clusters:  3 
##   - predominant cluster:  1 :  93 % 
##   - outlier:  7 % 
## Cluster  3 
##   - attributes:  bmi_s2, uric_acid_s2 
##   - minPts:  6 
##   - eps:  0.1245425 
##   - clusters:  3 
##   - predominant cluster:  1 :  80 % 
##   - outlier:  20 % 
##  
## Evo features: 
##   - sequence features:  1 
##   - descriptive features:  15 
##   - other evo features:  106 
## ======
library(dplyr)
library(ggplot2)
# Calculate Gain Ratio of both original and extracted features w.r.t. the 
# target variable
df_evo <- bind_cols(tibble(label = epi_evo$label), epi_evo$all_features)

gain_names <- FSelector::gain.ratio(label ~ ., data = df_evo)
gain_names %>%
  tibble::rownames_to_column("variable") %>%
  arrange(desc(attr_importance)) %>%
  mutate(variable = forcats::fct_reorder(variable, attr_importance)) %>%
  slice(1:20) %>%
  ggplot(aes(variable, attr_importance)) +
  geom_col() +
  coord_flip()

Bibliography



unmnn/evoxploit documentation built on Oct. 28, 2020, 12:24 p.m.