create_dt_matrices: create_dt_matrices

Description Usage Arguments Value

View source: R/create_dt_matrices.R

Description

Creates sparse document term matrices using labelled and unlabelled data, ready for use in xgboost algorithm.

Usage

1
2
3
4
5
6
7
8
create_dt_matrices(
  labelled_data,
  unlabelled_data,
  text_vars,
  topics,
  max_sparsity = 0.999,
  val_split = 0.2
)

Arguments

labelled_data

Pre-processed binary labelled dataframe.

unlabelled_data

Pre-processed unlabelled dataframe.

text_vars

List of text variables to include in analysis.

topics

List of topics to include in analysis.

max_sparsity

The maximum amount of sparsity the document term matrix should have. Default: 0.999

val_split

The amount of training data that should be included in the validation set. Default: 0.2

Value

A complete labelled document-term matrix with corresponding labels, a labelled document-term matrix split into training and validation sets with corresponding labels, and an unlabelled document-term matrix used for predictions.


rosepeglershare/TagR documentation built on Dec. 31, 2020, 3:12 a.m.