Description Usage Arguments Details Value
View source: R/hyperparameter_tuning.R
Finds the best set of xgboost parameters for each topic using random search.
1 2 3 4 5 6 7 8 | hyperparameter_tuning(
train_labelled_dtm,
valid_labelled_dtm,
train_labels,
val_labels,
topics,
num_its = 1000
)
|
train_labelled_dtm |
Training labelled document-term matrix. |
valid_labelled_dtm |
Validation labelled document-term matrix. |
train_labels |
Training labels matrix. |
val_labels |
Validation labels matrix. |
topics |
List of topics. |
num_its |
Number of iterations to run for each topic. Default: 1000 |
Parameters:
max_depth: Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit.
eta: Step size shrinkage used in update to prevent overfitting.
subsample: Subsample ratio of the training instances. Setting it to 0.5 means that XGBoost would randomly sample half of the training data prior to growing trees and this will prevent overfitting.
colsample_bytree: The subsample ratio of columns when constructing each tree. Subsampling occurs once for every tree constructed.
min_child_weight: Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning.
A dataframe with columns representing parameters and rows representing an optimal parameter set for each topic.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.