R6 Class Xtractor"

unlink("fxtract_files", recursive = TRUE)
library(fxtract)
xtractor = Xtractor$new("xtractor")

Add Data

Data must be added as dataframes with $add_data, where the grouping variable must be specified. You can also add dataframes for each ID individually. This is especially helpful for large datasets.

xtractor$add_data(iris, group_by = "Species")
library(dplyr)
for (i in unique(iris$Species)) {
  iris_i = iris %>% filter(Species == i)
  xtractor$add_data(iris_i, group_by = "Species")
}  

Add Features

Features must be added as functions which have a dataframe as input and a named vector as output. A named list with atomic entries of length 1 is also allowed as output (useful for numerical and categorical outputs). This function will be calculated for each ID of a grouping variable individually.

fun1 = function(data) {
  c(mean_sepal_length = mean(data$Sepal.Length),
    sd_sepal_length = sd(data$Sepal.Length))
}

fun2 = function(data) {
  list(mean_petal_length = mean(data$Petal.Length),
    sd_petal_length = sd(data$Petal.Length))
}
xtractor$add_feature(fun1)
xtractor$add_feature(fun2)

Calculate Features

Features are calculated by the method $calc_features():

xtractor$calc_features()

Collect Results

The desired final dataframe can be accessed by the slot $results:

xtractor$results
unlink("fxtract_files", recursive = TRUE)

Parallelization

Parallelization is realized with the package future Feature calculation and preprocessing data will be parallelized. For Windows and Linux machines you can parallelize like the following:

Use all cores

library(future)
plan(multisession)
future::nbrOfWorkers()

Set number of cores

plan(multisession, workers = 4)
future::nbrOfWorkers()

Stop parallelization

plan(sequential)
future::nbrOfWorkers()


Try the fxtract package in your browser

Any scripts or data that you put into this service are public.

fxtract documentation built on July 8, 2020, 5:43 p.m.