Home

/

CRAN

/

fxtract

/

R6 Class Xtractor"
In fxtract: Feature Extraction from Grouped Data

unlink("fxtract_files", recursive = TRUE)

Designed for large projects.
Data and features can be updated easily.
Data can be preprocessed.
Features are calculated on each ID of a grouping variable individually.
Easy parallelization with future.
Scales nicely for larger datasets. Data is only read into RAM, when needed.

library(fxtract)
xtractor = Xtractor$new("xtractor")

Add Data

Data must be added as dataframes with $add_data, where the grouping variable must be specified. You can also add dataframes for each ID individually. This is especially helpful for large datasets.

Add all data at once:

xtractor$add_data(iris, group_by = "Species")

Add datasets individually:

library(dplyr)
for (i in unique(iris$Species)) {
  iris_i = iris %>% filter(Species == i)
  xtractor$add_data(iris_i, group_by = "Species")
}

Add Features

Features must be added as functions which have a dataframe as input and a named vector as output. A named list with atomic entries of length 1 is also allowed as output (useful for numerical and categorical outputs). This function will be calculated for each ID of a grouping variable individually.

fun1 = function(data) {
  c(mean_sepal_length = mean(data$Sepal.Length),
    sd_sepal_length = sd(data$Sepal.Length))
}

fun2 = function(data) {
  list(mean_petal_length = mean(data$Petal.Length),
    sd_petal_length = sd(data$Petal.Length))
}

xtractor$add_feature(fun1)
xtractor$add_feature(fun2)

Calculate Features

Features are calculated by the method $calc_features():

xtractor$calc_features()

Collect Results

The desired final dataframe can be accessed by the slot $results:

xtractor$results

unlink("fxtract_files", recursive = TRUE)

Parallelization

Parallelization is realized with the package future Feature calculation and preprocessing data will be parallelized. For Windows and Linux machines you can parallelize like the following:

Use all cores

library(future)
plan(multisession)
future::nbrOfWorkers()

Set number of cores

plan(multisession, workers = 4)
future::nbrOfWorkers()

Stop parallelization

plan(sequential)
future::nbrOfWorkers()

Any scripts or data that you put into this service are public.

fxtract documentation built on July 8, 2020, 5:43 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

fxtract
Feature Extraction from Grouped Data

R6 Class Xtractor"
In fxtract: Feature Extraction from Grouped Data

Add Data

Add Features

Calculate Features

Collect Results

Parallelization

Use all cores

Set number of cores

Stop parallelization

Try the fxtract package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

fxtract Feature Extraction from Grouped Data

R6 Class Xtractor" In fxtract: Feature Extraction from Grouped Data

Add Data

Add Features

Calculate Features

Collect Results

Parallelization

Use all cores

Set number of cores

Stop parallelization

Try the fxtract package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

fxtract
Feature Extraction from Grouped Data

R6 Class Xtractor"
In fxtract: Feature Extraction from Grouped Data