knitr::opts_chunk$set(echo = TRUE) library(tidyverse) library(featr)
featr
stands for "feature" (oh, really?) and pronounced the same way.
There is a bunch of functions to produce not-so-common tasks such as
renaming variables in the data frame or list of data frames,
one-hot-encoding with custom parameters etc.
Please see Examples section
To install featr
please use devtools
:
devtools::install_github("pavel-filatov/featr")
Please note: you may need to install development version of rlang
package:
devtools::install_github("r-lib/rlang")
cor_index()
cor_index()
fucntion makes a correlation index: it gets each pair of variables, calculate
correlation between them, and turn it all to data frame.
Then the function sort data frame by absolute value of correlation.
featr::cor_index(mtcars)
identity_index()
For each pair of variables in input data frame identity_index()
calculates three numbers:
identical
- number of observations where first and second variables are equal;non_identical
- number of observations where first and second variables aren't equal;other
all the other cases (e.g. when one of the entries is NA
).identitity_index
is computed as ratio
$$identity\ index = \frac{identical}{identical + non_identical}$$
That's how the fuction works:
set.seed(42) df <- tibble(letter1 = sample(c(letters[1:3], NA, NaN), 100, TRUE), letter2 = sample(c(letters[1:3], NA), 100, TRUE), letter3 = sample(c(letters[1:3], NA), 100, TRUE), num1 = sample(1:4, 100, TRUE), num2 = sample(1:4, 100, TRUE), num3 = sample(c(1:3, NA), 100, TRUE)) identity_index(df)
You see that variables letter1
and letter3
have 22 equal,
40 different observations and 38 cases where at least on of them has NA
.
.na_include
flag lets you consider as identical cases where var1 == NA
and var2 == NA
.
The same way if var1 == NA
and var2 != NA
it will count this case as non_identical
.
identity_index(df, .na_include = TRUE)
Now letter1
and letter3
have 6 more identical cases.
make_shift()
make_shift()
is great option when you need to make a lot of lags (move down)
and leads (move up) for one or more variables.
This function captures variables to apply shifting and vector of window sizes and
returns a list of quoted expressions:
make_shift(mpg, qsec, .n = -2:2)
To make new variables you simply need to unqoute the function output
inside the dplyr::mutate()
call using !!!
(bang-bang-bang) opearator:
select(mtcars, mpg, qsec) %>% mutate(!!!make_shift(mpg, qsec, .n = -1:2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.