forensic_fasttext: Does 'forensics' on single FastText model
In jcgonzalez14/textwhiz: Support tools for Text Analytics

Description Usage Arguments Value Examples

View source: R/forensic_fasttext.R

Use to investigate where your choosen set of parameters for a fasttext model are lacking. Provides:

investigation to the magnitude of the predicted score for misclassfied texts.
comparison for the predicted label vs the actual label.
exposure for mislabelled training data.

1	forensic_fasttext(k, texts, text_ids, labels, parameters, seed)

`k`	The number of k-folds for each combination set of parameters to test. Defaults to 5.
`texts`	The texts given by the user to classify later.
`labels`	The labels for the texts given by the user train the FastText model.
`parameters`	A df that contains all the different combinations of paramters for a FastText model. Must include the following: lr = learning rate epoch = # of epochs dim = dimensions ws = window size wordNgrams = word n-grams minn = min of character n-grams maxn = max of character n-grams
`seed`	A number for `set.seed` when partioning data for have model reproducability.
`texts_ids`	The text_ids in the text to output nice clean format.

A dataframe with the average and SD accuracy for each row of parameters.

fast.text.parameters <- expand.grid(
  lr = seq(4, 4.3, 0.5),
  epoch = seq(30, 33, 10),
  dim = seq(100,120, 25),
  ws = seq(4, 6, 2),
  wordNgrams = 2,
  minn = 2,
  maxn = 6
  )

tune_fasttext(k = 5,
              texts = df$mytext,
              text_ids = df$text_id,
              labels = df$topic,
              parameters = fast.text.parameters,
              seed = 123)