Description Usage Arguments Value
View source: R/predict_topics.R
Trains an xgboost model for each topic and uses this to predict the probability that unlabelled comments belong to this topic or not.
1 2 3 4 5 6 7 8 9 10 11 12 13 | predict_topics(
unlabelled_raw,
labelled_dtm,
unlabelled_dtm,
labels_matrix,
text_vars,
num_vars,
topics,
parameters = list(booster = "gbtree", objective = "binary:logistic", max_depth = 6,
eta = 0.3, subsample = 1, colsample_bytree = 1, min_child_weight = 1),
parameters_df = NULL,
nrounds = 1000
)
|
unlabelled_raw |
Original unlabelled dataframe before any pre-processing. |
labelled_dtm |
Full labelled document-term matrix. |
unlabelled_dtm |
Unlabelled document-term matrix used for predictions. |
labels_matrix |
Labels matrix for labelled_dtm. |
text_vars |
List of text variables. |
num_vars |
List of numerical variables. |
topics |
List of topics. |
parameters |
Default list of parameters if user did not perform hyperparameter tuning. |
parameters_df |
A dataframe with columns representing parameters and rows representing an optimal parameter set for each topic. |
nrounds |
Number of rounds that the xgboost model should be trained for. Default: 1000 |
A dataframe with the original comments, chosen attributes and probabilities that they belong to each topic.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.