tune_fasttext: 'Tune' FastText Models
In jcgonzalez14/textwhiz: Support tools for Text Analytics

Description Usage Arguments Value Examples

View source: R/tune_fasttext.R

Tests several combinations of parameters for FastText Models. Cross validates each set of parameters k times to have robust model evaluations. Can be paralellized to improve speed and allow for large grid search paramters.

1 2	tune_fasttext(k = 5, texts, text_ids, labels, parameters, seed = 123, parallel = TRUE)

`k`	The number of k-folds for each combination set of parameters to test. Defaults to 5.
`texts`	The texts given by the user to classify.
`text_ids`	The text_ids in the text to output nice clean format.
`labels`	The labels for the texts given by the user train the FastText model.
`parameters`	A df that contains all the different combinations of paramters for a FastText model. Must include the following: lr = learning rate epoch = # of epochs dim = dimensions ws = window size wordNgrams = word n-grams minn = min of character n-grams maxn = max of character n-grams
`seed`	A number for `set.seed` when partioning data for have model reproducability.
`parallel`	Defaults to TRUE. Determines whether you want to parallize the analysis.

A dataframe with the average accuracy and SD for each row of parameters.

  fast.text.parameters <- expand.grid(
  lr = seq(4, 4.3, 0.5),
  epoch = seq(30, 33, 10),
  dim = seq(100,120, 25),
  ws = seq(4, 6, 2),
  wordNgrams = 2,
  minn = 2,
  maxn = 6
  )

tune_fasttext(k = 5,
              texts = df$mytext,
              text_ids = df$text_id,
              labels = df$topic,
              parameters = fast.text.parameters,
              seed = 123,
              parallel = T)