hyperparameter_tuning: hyperparameter_tuning
In rosepeglershare/TagR: SHARE's TagR package that uses machine learning to identify a comment's relevant topics

Finds the best set of xgboost parameters for each topic using random search.

hyperparameter_tuning(
  train_labelled_dtm,
  valid_labelled_dtm,
  train_labels,
  val_labels,
  topics,
  num_its = 1000
)

`train_labelled_dtm`	Training labelled document-term matrix.
`valid_labelled_dtm`	Validation labelled document-term matrix.
`train_labels`	Training labels matrix.
`val_labels`	Validation labels matrix.
`topics`	List of topics.
`num_its`	Number of iterations to run for each topic. Default: 1000

Parameters:

max_depth: Maximum depth of a tree. Increasing this value will make the model more complex and more likely to overfit.
eta: Step size shrinkage used in update to prevent overfitting.
subsample: Subsample ratio of the training instances. Setting it to 0.5 means that XGBoost would randomly sample half of the training data prior to growing trees and this will prevent overfitting.
colsample_bytree: The subsample ratio of columns when constructing each tree. Subsampling occurs once for every tree constructed.
min_child_weight: Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning.

A dataframe with columns representing parameters and rows representing an optimal parameter set for each topic.

rosepeglershare/TagR documentation built on Dec. 31, 2020, 3:12 a.m.

rosepeglershare/TagR index

README.md

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Description