unlink("fxtract_files", recursive = TRUE)
future
.library(fxtract) xtractor = Xtractor$new("xtractor")
Data must be added as dataframes with $add_data
, where the grouping variable must be specified.
You can also add dataframes for each ID individually. This is especially helpful for large datasets.
xtractor$add_data(iris, group_by = "Species")
library(dplyr) for (i in unique(iris$Species)) { iris_i = iris %>% filter(Species == i) xtractor$add_data(iris_i, group_by = "Species") }
Features must be added as functions which have a dataframe as input and a named vector as output. A named list with atomic entries of length 1 is also allowed as output (useful for numerical and categorical outputs). This function will be calculated for each ID of a grouping variable individually.
fun1 = function(data) { c(mean_sepal_length = mean(data$Sepal.Length), sd_sepal_length = sd(data$Sepal.Length)) } fun2 = function(data) { list(mean_petal_length = mean(data$Petal.Length), sd_petal_length = sd(data$Petal.Length)) }
xtractor$add_feature(fun1) xtractor$add_feature(fun2)
Features are calculated by the method $calc_features()
:
xtractor$calc_features()
The desired final dataframe can be accessed by the slot $results
:
xtractor$results
unlink("fxtract_files", recursive = TRUE)
Parallelization is realized with the package future Feature calculation and preprocessing data will be parallelized. For Windows and Linux machines you can parallelize like the following:
library(future) plan(multisession) future::nbrOfWorkers()
plan(multisession, workers = 4) future::nbrOfWorkers()
plan(sequential) future::nbrOfWorkers()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.