moeda: Run a MOEDA Analysis

Description Usage Arguments Details Value

View source: R/moeda_function.R

Description

The main function of the package that runs Modeling-Oriented Exploratory Data Analysis on your data.

Usage

1
moeda(df, target_var, n_top_vars = 4, cuts = 4, ...)

Arguments

df

A dataset you wish to analyze, typically a data.frame or a tibble

target_var

Unquoted expression or a scalar character representing the column in the 'df' that is your target (aka dependent) variable for analysis

n_top_vars

An integer scalar specifying up to how many top variables you wish to consider for your analysis. By default 4

cuts

An integer scalar telling 'base::cut()' how many equal width buckets to create in the top_vars. By default 4

...

Arguments passed to 'base::cut()'

Details

Currently, this function performs the following actions:

  1. A random forest is run on the training set with 80 percent of observations

  2. Permutation variable importance is exported from the model and reported to the console together with some additional information

  3. Model's performance is assessed on the test set of the remaining 20 percent of observations and reported to the console

  4. Top variables (their number can be specified with 'n_top_vars' argument) are discretized using equal widths discretization via 'base::cut()'. You can select the number of cuts with the 'cuts' argument.

  5. An upset plot of the intersections from 4. together with target variables is printed.

  6. A 'GGally::ggpairs()' plot of top variables is printed.

  7. The function returns the original 'df' and joins top features columns that were cut together with resulting intersections. These additional columns have 'moedized' in their name.

Value

The original df as a tibble with joined top features columns that were cut together with resulting intersections. These additional columns have 'moedized' in their name.


jarekkupisz/MOEDA documentation built on Dec. 20, 2021, 9:05 p.m.