TEFeatureExtractor: Feature extractor for reducing the number for dimensions of...
In aifeducation: Artificial Intelligence for Education

TEFeatureExtractor

R Documentation

Feature extractor for reducing the number for dimensions of text embeddings.

Description

Abstract class for auto encoders with 'pytorch'.

Objects of this class are used for reducing the number of dimensions of text embeddings created by an object of class TextEmbeddingModel.

For training an object of class EmbeddedText or LargeDataSetForTextEmbeddings generated by an object of class TextEmbeddingModel is necessary. Passing raw texts is not supported.

For prediction an ob object class EmbeddedText or LargeDataSetForTextEmbeddings is necessary that was generated with the same TextEmbeddingModel as during training. Prediction outputs a new object of class EmbeddedText or LargeDataSetForTextEmbeddings which contains a text embedding with a lower number of dimensions.

All models use tied weights for the encoder and decoder layers (except method="LSTM") and apply the estimation of orthogonal weights. In addition, training tries to train the model to achieve uncorrelated features.

Objects of class TEFeatureExtractor are designed to be used with classifiers such as TEClassifierRegular and TEClassifierProtoNet.

Value

A new instances of this class.

Super classes

aifeducation::AIFEMaster -> aifeducation::AIFEBaseModel -> aifeducation::ModelsBasedOnTextEmbeddings -> TEFeatureExtractor

Methods

Public methods

TEFeatureExtractor$configure()
TEFeatureExtractor$train()
TEFeatureExtractor$extract_features()
TEFeatureExtractor$extract_features_large()
TEFeatureExtractor$plot_training_history()
TEFeatureExtractor$clone()

Inherited methods

Method `configure()`

Creating a new instance of this class.

Usage

TEFeatureExtractor$configure(
  name = NULL,
  label = NULL,
  text_embeddings = NULL,
  features = 128L,
  method = "dense",
  orthogonal_method = "matrix_exp",
  noise_factor = 0.2
)

Arguments

name: string Name of the new model. Please refer to common name conventions. Free text can be used with parameter label. If set to NULL a unique ID is generated automatically. Allowed values: any
label: string Label for the new model. Here you can use free text. Allowed values: any
text_embeddings: ⁠EmbeddedText, LargeDataSetForTextEmbeddings⁠ Object of class EmbeddedText or LargeDataSetForTextEmbeddings.
features: int Number of features the model should use. Allowed values: 1 <= x
method: string Method to use for the feature extraction. 'lstm' for an extractor based on LSTM-layers or 'Dense' for dense layers. Allowed values: 'Dense', 'LSTM'
orthogonal_method: string Method to use for the feature extraction. 'lstm' for an extractor based on LSTM-layers or 'Dense' for dense layers. Allowed values: 'Dense', 'LSTM'
noise_factor: double Value between 0 and a value lower 1 indicating how much noise should be added to the input during training. Allowed values: 0 <= x <= 1

Returns

Returns an object of class TEFeatureExtractor which is ready for training.

Method `train()`

Method for training a neural net.

Usage

TEFeatureExtractor$train(
  data_embeddings = NULL,
  data_val_size = 0.25,
  sustain_track = TRUE,
  sustain_iso_code = NULL,
  sustain_region = NULL,
  sustain_interval = 15L,
  sustain_log_level = "warning",
  epochs = 40L,
  batch_size = 32L,
  trace = TRUE,
  ml_trace = 1L,
  log_dir = NULL,
  log_write_interval = 10L,
  lr_rate = 0.001,
  lr_warm_up_ratio = 0.02,
  optimizer = "AdamW"
)

Arguments

data_embeddings: ⁠EmbeddedText, LargeDataSetForTextEmbeddings⁠ Object of class EmbeddedText or LargeDataSetForTextEmbeddings.
data_val_size: double between 0 and 1, indicating the proportion of cases which should be used for the validation sample during the estimation of the model. The remaining cases are part of the training data. Allowed values: 0 < x < 1
sustain_track: bool If TRUE energy consumption is tracked during training via the python library 'codecarbon'.
sustain_iso_code: string ISO code (Alpha-3-Code) for the country. This variable must be set if sustainability should be tracked. A list can be found on Wikipedia: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes. Allowed values: any
sustain_region: string Region within a country. Only available for USA and Canada See the documentation of codecarbon for more information. https://mlco2.github.io/codecarbon/parameters.html Allowed values: any
sustain_interval: int Interval in seconds for measuring power usage. Allowed values: 1 <= x
sustain_log_level: string Level for printing information to the console. Allowed values: 'debug', 'info', 'warning', 'error', 'critical'
epochs: int Number of training epochs. Allowed values: 1 <= x
batch_size: int Size of the batches for training. Allowed values: 1 <= x
trace: bool TRUE if information about the estimation phase should be printed to the console.
ml_trace: int ml_trace=0 does not print any information about the training process from pytorch on the console. Allowed values: 0 <= x <= 1
log_dir: string Path to the directory where the log files should be saved. If no logging is desired set this argument to NULL. Allowed values: any
log_write_interval: int Time in seconds determining the interval in which the logger should try to update the log files. Only relevant if log_dir is not NULL. Allowed values: 1 <= x
lr_rate: double Initial learning rate for the training. Allowed values: 0 < x <= 1
lr_warm_up_ratio: double Number of epochs used for warm up. Allowed values: 0 < x < 0.5
optimizer: string determining the optimizer used for training. Allowed values: 'Adam', 'RMSprop', 'AdamW', 'SGD'

Returns

Function does not return a value. It changes the object into a trained classifier.

Method `extract_features()`

Method for extracting features. Applying this method reduces the number of dimensions of the text embeddings. Please note that this method should only be used if a small number of cases should be compressed since the data is loaded completely into memory. For a high number of cases please use the method extract_features_large.

Usage

TEFeatureExtractor$extract_features(data_embeddings, batch_size)

Arguments

data_embeddings: Object of class EmbeddedText,LargeDataSetForTextEmbeddings, datasets.arrow_dataset.Dataset or array containing the text embeddings which should be reduced in their dimensions.
batch_size: int batch size.

Returns

Returns an object of class EmbeddedText containing the compressed embeddings.

Method `extract_features_large()`

Method for extracting features from a large number of cases. Applying this method reduces the number of dimensions of the text embeddings.

Usage

TEFeatureExtractor$extract_features_large(
  data_embeddings,
  batch_size,
  trace = FALSE
)

Arguments

data_embeddings: Object of class EmbeddedText or LargeDataSetForTextEmbeddings containing the text embeddings which should be reduced in their dimensions.
batch_size: int batch size.
trace: bool If TRUE information about the progress is printed to the console.

Returns

Returns an object of class LargeDataSetForTextEmbeddings containing the compressed embeddings.

Method `plot_training_history()`

Method for requesting a plot of the training history. This method requires the R package 'ggplot2' to work.

Usage

TEFeatureExtractor$plot_training_history(
  x_min = NULL,
  x_max = NULL,
  y_min = NULL,
  y_max = NULL,
  ind_best_model = TRUE,
  text_size = 10L
)

Arguments

x_min: int Minimal value for x-axis. Set to NULL for an automatic adjustment. Allowed values: x
x_max: int Maximal value for x-axis. Set to NULL for an automatic adjustment. Allowed values: x
y_min: int Minimal value for y-axis. Set to NULL for an automatic adjustment. Allowed values: x
y_max: int Maximal value for y-axis. Set to NULL for an automatic adjustment. Allowed values: x
ind_best_model: bool If TRUE the plot indicates the best states of the model according to the chosen measure.
text_size: int Size of text elements. Allowed values: 1 <= x

Returns

Returns a plot of class ggplot visualizing the training process.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

TEFeatureExtractor$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Note

features refers to the number of features for the compressed text embeddings.

This model requires pad_value=0. If this condition is not met the padding value is switched automatically.

This model requires that the underlying TextEmbeddingModel uses pad_value=0. If this condition is not met the pad value is switched before training.

aifeducation
Artificial Intelligence for Education

TEFeatureExtractor: Feature extractor for reducing the number for dimensions of...
In aifeducation: Artificial Intelligence for Education

Feature extractor for reducing the number for dimensions of text embeddings.

Description

Value

Super classes

Methods

Public methods

Method `configure()`

Usage

Arguments

Returns

Method `train()`

Usage

Arguments

Returns

Method `extract_features()`

Usage

Arguments

Returns

Method `extract_features_large()`

Usage

Arguments

Returns

Method `plot_training_history()`

Usage

Arguments

Returns

Method `clone()`

Usage

Arguments

Note

See Also

Related to TEFeatureExtractor in aifeducation...

R Package Documentation

Browse R Packages

We want your feedback!

aifeducation Artificial Intelligence for Education

TEFeatureExtractor: Feature extractor for reducing the number for dimensions of... In aifeducation: Artificial Intelligence for Education

Feature extractor for reducing the number for dimensions of text embeddings.

Description

Value

Super classes

Methods

Public methods

Method configure()

Usage

Arguments

Returns

Method train()

Usage

Arguments

Returns

Method extract_features()

Usage

Arguments

Returns

Method extract_features_large()

Usage

Arguments

Returns

Method plot_training_history()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Note

See Also

Related to TEFeatureExtractor in aifeducation...

R Package Documentation

Browse R Packages

We want your feedback!

aifeducation
Artificial Intelligence for Education

TEFeatureExtractor: Feature extractor for reducing the number for dimensions of...
In aifeducation: Artificial Intelligence for Education

Method `configure()`

Method `train()`

Method `extract_features()`

Method `extract_features_large()`

Method `plot_training_history()`

Method `clone()`