predict_topics: predict_topics

Description Usage Arguments Value

View source: R/predict_topics.R

Description

Trains an xgboost model for each topic and uses this to predict the probability that unlabelled comments belong to this topic or not.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
predict_topics(
  unlabelled_raw,
  labelled_dtm,
  unlabelled_dtm,
  labels_matrix,
  text_vars,
  num_vars,
  topics,
  parameters = list(booster = "gbtree", objective = "binary:logistic", max_depth = 6,
    eta = 0.3, subsample = 1, colsample_bytree = 1, min_child_weight = 1),
  parameters_df = NULL,
  nrounds = 1000
)

Arguments

unlabelled_raw

Original unlabelled dataframe before any pre-processing.

labelled_dtm

Full labelled document-term matrix.

unlabelled_dtm

Unlabelled document-term matrix used for predictions.

labels_matrix

Labels matrix for labelled_dtm.

text_vars

List of text variables.

num_vars

List of numerical variables.

topics

List of topics.

parameters

Default list of parameters if user did not perform hyperparameter tuning.

parameters_df

A dataframe with columns representing parameters and rows representing an optimal parameter set for each topic.

nrounds

Number of rounds that the xgboost model should be trained for. Default: 1000

Value

A dataframe with the original comments, chosen attributes and probabilities that they belong to each topic.


rosepeglershare/TagR documentation built on Dec. 31, 2020, 3:12 a.m.