knitr::opts_chunk$set(echo = TRUE) library(tidyverse) library(featr)
featr stands for "feature" (oh, really?) and pronounced the same way.
There is a bunch of functions to produce not-so-common tasks such as
renaming variables in the data frame or list of data frames,
one-hot-encoding with custom parameters etc.
Please see Examples section
To install featr please use devtools:
devtools::install_github("pavel-filatov/featr")
Please note: you may need to install development version of rlang package:
devtools::install_github("r-lib/rlang")
cor_index()cor_index() fucntion makes a correlation index: it gets each pair of variables, calculate
correlation between them, and turn it all to data frame.
Then the function sort data frame by absolute value of correlation.
featr::cor_index(mtcars)
identity_index()For each pair of variables in input data frame identity_index() calculates three numbers:
identical - number of observations where first and second variables are equal;non_identical - number of observations where first and second variables aren't equal;other all the other cases (e.g. when one of the entries is NA).identitity_index is computed as ratio
$$identity\ index = \frac{identical}{identical + non_identical}$$
That's how the fuction works:
set.seed(42) df <- tibble(letter1 = sample(c(letters[1:3], NA, NaN), 100, TRUE), letter2 = sample(c(letters[1:3], NA), 100, TRUE), letter3 = sample(c(letters[1:3], NA), 100, TRUE), num1 = sample(1:4, 100, TRUE), num2 = sample(1:4, 100, TRUE), num3 = sample(c(1:3, NA), 100, TRUE)) identity_index(df)
You see that variables letter1 and letter3 have 22 equal,
40 different observations and 38 cases where at least on of them has NA.
.na_include flag lets you consider as identical cases where var1 == NA and var2 == NA.
The same way if var1 == NA and var2 != NA it will count this case as non_identical.
identity_index(df, .na_include = TRUE)
Now letter1 and letter3 have 6 more identical cases.
make_shift()make_shift() is great option when you need to make a lot of lags (move down)
and leads (move up) for one or more variables.
This function captures variables to apply shifting and vector of window sizes and
returns a list of quoted expressions:
make_shift(mpg, qsec, .n = -2:2)
To make new variables you simply need to unqoute the function output
inside the dplyr::mutate() call using !!! (bang-bang-bang) opearator:
select(mtcars, mpg, qsec) %>% mutate(!!!make_shift(mpg, qsec, .n = -1:2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.