tune_fasttext: 'Tune' FastText Models

Description Usage Arguments Value Examples

View source: R/tune_fasttext.R

Description

Tests several combinations of parameters for FastText Models. Cross validates each set of parameters k times to have robust model evaluations. Can be paralellized to improve speed and allow for large grid search paramters.

Usage

1
2
tune_fasttext(k = 5, texts, text_ids, labels, parameters, seed = 123,
  parallel = TRUE)

Arguments

k

The number of k-folds for each combination set of parameters to test. Defaults to 5.

texts

The texts given by the user to classify.

text_ids

The text_ids in the text to output nice clean format.

labels

The labels for the texts given by the user train the FastText model.

parameters

A df that contains all the different combinations of paramters for a FastText model. Must include the following:

  • lr = learning rate

  • epoch = # of epochs

  • dim = dimensions

  • ws = window size

  • wordNgrams = word n-grams

  • minn = min of character n-grams

  • maxn = max of character n-grams

seed

A number for set.seed when partioning data for have model reproducability.

parallel

Defaults to TRUE. Determines whether you want to parallize the analysis.

Value

A dataframe with the average accuracy and SD for each row of parameters.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
  fast.text.parameters <- expand.grid(
  lr = seq(4, 4.3, 0.5),
  epoch = seq(30, 33, 10),
  dim = seq(100,120, 25),
  ws = seq(4, 6, 2),
  wordNgrams = 2,
  minn = 2,
  maxn = 6
  )

tune_fasttext(k = 5,
              texts = df$mytext,
              text_ids = df$text_id,
              labels = df$topic,
              parameters = fast.text.parameters,
              seed = 123,
              parallel = T)

jcgonzalez14/textwhiz documentation built on Aug. 26, 2020, 9:39 a.m.