View source: R/classify_text.r
apply_basic_recipe | R Documentation |
Apply basic recipe to the dataframe that includes a text column. The basic recipe includes tokenization (using bigrams), removing stop words, filtering stop words by max tokens = 1,000, and normalization of document length using TF-IDF.
apply_basic_recipe( input_data, formula, text, token_threshold = 1000, add_embedding = NULL, embed_dims = 100 )
input_data |
An input data. |
formula |
A formula that specifies the relationship between the outcome and predictor variables (e.g, |
text |
The name of the text column in the data. |
token_threshold |
The maximum number of the tokens will be used in the classification. |
add_embedding |
Add word embedding for feature engineering. The default value is NULL. Replace NULL with TRUE, if you want to add word embedding. |
embed_dims |
Word embedding dimensions. The default value is 100. |
A prep object.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.