In pavel-filatov/featr: Easy-to-use feature preprocessing tools

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(featr)

featr stands for "feature" (oh, really?) and pronounced the same way. There is a bunch of functions to produce not-so-common tasks such as renaming variables in the data frame or list of data frames, one-hot-encoding with custom parameters etc. Please see Examples section

Installation

To install featr please use devtools:

devtools::install_github("pavel-filatov/featr")

Please note: you may need to install development version of rlang package:

devtools::install_github("r-lib/rlang")

Examples

`cor_index()`

cor_index() fucntion makes a correlation index: it gets each pair of variables, calculate correlation between them, and turn it all to data frame. Then the function sort data frame by absolute value of correlation.

featr::cor_index(mtcars)

`identity_index()`

For each pair of variables in input data frame identity_index() calculates three numbers:

identical - number of observations where first and second variables are equal;
non_identical - number of observations where first and second variables aren't equal;
other all the other cases (e.g. when one of the entries is NA).

identitity_index is computed as ratio $$identity\ index = \frac{identical}{identical + non_identical}$$

That's how the fuction works:

set.seed(42)
df <- tibble(letter1 = sample(c(letters[1:3], NA, NaN), 100, TRUE),
             letter2 = sample(c(letters[1:3], NA), 100, TRUE),
             letter3 = sample(c(letters[1:3], NA), 100, TRUE),
             num1 = sample(1:4, 100, TRUE),
             num2 = sample(1:4, 100, TRUE),
             num3 = sample(c(1:3, NA), 100, TRUE))

identity_index(df)

You see that variables letter1 and letter3 have 22 equal, 40 different observations and 38 cases where at least on of them has NA.

.na_include flag lets you consider as identical cases where var1 == NA and var2 == NA. The same way if var1 == NA and var2 != NA it will count this case as non_identical.

identity_index(df, .na_include = TRUE)

Now letter1 and letter3 have 6 more identical cases.

`make_shift()`

make_shift() is great option when you need to make a lot of lags (move down) and leads (move up) for one or more variables. This function captures variables to apply shifting and vector of window sizes and returns a list of quoted expressions:

make_shift(mpg, qsec, .n = -2:2)

To make new variables you simply need to unqoute the function output inside the dplyr::mutate() call using !!! (bang-bang-bang) opearator:

select(mtcars, mpg, qsec) %>% 
  mutate(!!!make_shift(mpg, qsec, .n = -1:2))

pavel-filatov/featr documentation built on May 12, 2019, 1:29 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

pavel-filatov/featr
Easy-to-use feature preprocessing tools

In pavel-filatov/featr: Easy-to-use feature preprocessing tools

Installation

Examples

`cor_index()`

`identity_index()`

`make_shift()`

R Package Documentation

Browse R Packages

We want your feedback!

pavel-filatov/featr Easy-to-use feature preprocessing tools

In pavel-filatov/featr: Easy-to-use feature preprocessing tools

Installation

Examples

cor_index()

identity_index()

make_shift()

R Package Documentation

Browse R Packages

We want your feedback!

pavel-filatov/featr
Easy-to-use feature preprocessing tools

`cor_index()`

`identity_index()`

`make_shift()`