Xtractor: R6 Object for Feature Extraction.
In fxtract: Feature Extraction from Grouped Data

Description Format Usage Arguments Details Fields Methods Examples

Xtractor calculates features from raw data for each ID of a grouping variable individually. This process can be parallelized with the package future.

R6Class object.

1	xtractor = Xtractor$new("xtractor")

For Xtractor$new():

name:: ('character(1)'): A user defined name of the Xtractor. All necessary data will be saved on the path: ./fxtract_files/name/
load:: ('logical(1)'): If TRUE, an existing Xtractor will be loaded.
file.dir:: ('character(1)'): Path where all files of the Xtractor are saved. Default is the current working directory.

All datasets and feature functions are saved in this R6 object. Datasets will be saved as single RDS files (for each ID) and feature functions are calculated on each single dataset. A big advantage of this method is that it scales nicely for larger datasets. Data is only read into RAM, when needed.

error_messages:: ('data.frame()'): Active binding. A dataframe with information about error messages.
ids:: ('character()'): Active binding. A character vector with the IDs of the grouping variable.
features:: ('character()'): Active binding. A character vector with the feature functions which were added.
status:: ('data.frame()'): Active binding. A dataframe with an overview over which features are calculated on which datasets.
results:: ('data.frame()'): Active binding. A dataframe with all calculated features of all IDs.

add_data(data, group_by): [data: ('data.frame' | 'data.table')] A dataframe or data.table which shall be added to the R6 object.
[group_by: (‘character(1)')] The grouping variable’s name of the dataframe.

This method writes single RDS files for each group.
preprocess_data(fun): [fun: ('function')] A function, which has a dataframe as input and a dataframe as output.

This method loads the RDS files and applies this function on them. The old RDS files are overwritten.
remove_data(ids): [ids: ('character()')] One or many IDs of the grouping variable.

This method deletes the RDS files of the given IDs.
get_data(ids): [ids: ('character()')] One or many IDs of the grouping variable.

This method returns one dataframe with the chosen IDs.
add_feature(fun, check_fun): [fun: ('function')] A function, which has a dataframe as input and a named vector or list as output.
[check_fun: ('logical(1)')] The function will be checked if it returns a vector or a list. Defaults to TRUE. Disable, if calculation takes too long.

This method adds the feature function to the R6 object. It writes an RDS file of the function which can be retrieved later.
remove_feature(fun): [fun: ('function | character(1)')] A function (or the name of the function as character) which shall be removed.

This method removes the function from the object and deletes all corresponding files and results.
get_feature(fun): [fun: ('character(1)')] The name of a function as character.

This method reads the RDS file of the function. Useful for debugging after loading an Xtractor.
calc_features(features, ids): [features: ('character()')] A character vector of the names of the features which shall be calculated. Defaults to all features.
[ids: ('character()')] One or many IDs of the grouping variable. Defaults to all IDs.

This method calculates all features on the chosen IDs.
retry_failed_features(features): [features: ('character()')] A character vector of the names of the features which shall be calculated. Defaults to all features.

This method retries calculation of failed features. Useful if calculation failed because of memory problems.
plot(): [internal] method to print the R6 object.
clone(): [internal] method to clone the R6 object.
initialize(): [internal] method to initialize the R6 object.

# one feature function
dir = tempdir()
xtractor = Xtractor$new("xtractor", file.dir = dir)
xtractor$add_data(iris, group_by = "Species")
xtractor$ids
fun = function(data) {
  c(mean_sepal_length = mean(data$Sepal.Length))
}
xtractor$add_feature(fun)
xtractor$features
xtractor$calc_features()
xtractor$results
xtractor$status
xtractor

# failing function on only one ID
fun2 = function(data) {
  if ("setosa" %in% data$Species) stop("my error")
  c(sd_sepal_length = sd(data$Sepal.Length))
}
xtractor$add_feature(fun2)
xtractor$calc_features()
xtractor$results
xtractor$error_messages
xtractor

# remove feature function
xtractor$remove_feature("fun2")
xtractor$results
xtractor

# remove ID
xtractor$remove_data("setosa")
xtractor$results
xtractor$ids
xtractor

# get datasets and functions
fun3 = xtractor$get_feature("fun")
df = xtractor$get_data()
dplyr_wrapper(data = df, group_by = "Species", fun = fun3)