Bdpar: Class to manage the preprocess of the files throughout the...

BdparR Documentation

Class to manage the preprocess of the files throughout the flow of pipes

Description

Bdpar class provides the static variables required to perform the whole data flow process. To this end Bdpar is in charge of (i) initialize the objects of handle the connections to APIs (Connections) and handles json resources (ResourceHandler) and (ii) executing the flow of pipes (inherited from GenericPipeline class) passed as argument.

Details

In the case that some pipe, defined on the workflow, needs some type of configuration, it can be defined through bdpar.Options variable which have different methods to support the functionality of different pipes.

Static variables

  • connections: (Connections) object that handles the connections with YouTube and Twitter.

  • resourceHandler: (ResourceHandler) object that handles the json resources files.

Methods

Public methods


Method new()

Creates a Bdpar object. Initializes the static variables: connections and resourceHandler.

Usage
Bdpar$new()

Method execute()

Preprocess files through the indicated flow of pipes.

Usage
Bdpar$execute(
  path,
  extractors = ExtractorFactory$new(),
  pipeline = DefaultPipeline$new(),
  cache = TRUE,
  verbose = FALSE,
  summary = FALSE
)
Arguments
path

A character value. The path where the files to be processed are located.

extractors

A ExtractorFactory value. Class which implements the createInstance method to choose which type of Instance is created.

pipeline

A GenericPipeline value. Subclass of GenericPipeline, which implements the execute method. By default, it is the DefaultPipeline pipeline.

cache

(logical) flag indicating if the status of the instances will be stored after each pipe. This allows to avoid rejections of previously executed tasks, if the order and configuration of the pipe and pipeline is the same as what is stored in the cache.

verbose

(logical) flag indicating for printing messages, warnings and errors.

summary

(logical) flag indicating if a summary of the pipeline execution is provided or not.

Details

In case of wanting to parallelize, it is necessary to indicate the number of cores to be used through bdpar.Options$set("numCores", numCores)

Returns

The list of Instances that have been preprocessed.


Method clone()

The objects of this class are cloneable with this method.

Usage
Bdpar$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

bdpar.Options, Connections, DefaultPipeline, DynamicPipeline, GenericPipeline, Instance, ExtractorFactory, ResourceHandler, runPipeline

Examples

## Not run: 

#If it is necessary to indicate any configuration, do it through:
#bdpar.Options$set(key, value)
#If the key is not initialized, do it through:
#bdpar.Options$add(key, value)

#If it is necessary parallelize, do it through:
#bdpar.Options$set("numCores", numCores)

#If it is necessary to change the behavior of the log, do it through:
#bdpar.Options$configureLog(console = TRUE, threshold = "INFO", file = NULL)

#Folder with the files to preprocess
path <- system.file("example",
                    package = "bdpar")

#Object which decides how creates the instances
extractors <- ExtractorFactory$new()

#Object which indicates the pipes' flow
pipeline <- DefaultPipeline$new()

objectBdpar <- Bdpar$new()

#Starting file preprocessing...
objectBdpar$execute(path = path,
                    extractors = extractors,
                    pipeline = pipeline,
                    cache = FALSE,
                    verbose = FALSE,
                    summary = TRUE)

## End(Not run)

bdpar documentation built on Aug. 22, 2022, 5:08 p.m.