An R interface to the Python module Featuretools.
featuretoolsR provides functionality from the Python module
featuretools, which aims to automate feature engineering. This package is very much a work in progress as Featuretools offers a lot of functionality. Any PRs are much appreciated.
The latest stable release is found on CRAN.
You can get the latest version of
featuretoolsR by installing it straight from Github:
You'll need to have a working Python environment as well as
featuretools installed. The recommended way is to use the built-in function
install_featuretools() which automatically sets up a virtual environment for the package and installs
All functions in
featuretoolsR comes with documentation, but it's advised to briefly browse through the Featuretools Python documentation. It'll cover things like
An entityset is the set which contain all your entities. To create a set and add an entity straight away, you can use
# Libs library(featuretoolsR) library(magrittr) # Create some mock data set_1 <- data.frame(key = 1:100, value = sample(letters, 100, T), a = rep(Sys.Date(), 100)) set_2 <- data.frame(key = 1:100, value = sample(LETTERS, 100, T), b = rep(Sys.time(), 100)) # Create entityset es <- as_entityset( set_1, index = "key", entity_id = "set_1", id = "demo", time_index = "a" )
To add entities (i.e if you have relational data across multiple
data.frames), this can be achieved with
add_entity. This function is pipe friendly. For this demo-case, we'll use
es <- es %>% add_entity( df = set_2, entity_id = "set_2", index = "key", time_index = "b" )
With relational data, it's useful to define a relationship between two or more entities. This can be done with
es <- es %>% add_relationship( parent_set = "set_1", child_set = "set_2", parent_idx = "key", child_idx = "key" )
The bread and butter of Featuretools is the
dfs-function (official docs here). It will attempt to create features based on
*_primitives you provide (more on primitives below).
ft_matrix <- es %>% dfs( target_entity = "set_1", trans_primitives = c("and", "cum_sum") )
To use the new data.frame/features created by
dfs, a function unique for
tidy_feature_matrix can be used. A few "nice-to-have" arguments can be passed to clean the new data, like removing near zero variance variables, as well as replacing
tidy <- tidy_feature_matrix(ft_matrix, remove_nzv = T, nan_is_na = T, clean_names = T)
Featuretools supports a lot of primitives. These are accessible with the function
list_primitives() which returns a data.frame containing type (aggregation (
agg_primitives) or transform (
trans_primitives)), name (in the example above, "and" and "divide") as well as a brief description of the primitive itself.
reticulate - an R interface to Python.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.