textTrainLists: Individually trains word embeddings from several text...

View source: R/2_1_textTrain.R

textTrainListsR Documentation

Individually trains word embeddings from several text variables to several numeric or categorical variables.

Description

Individually trains word embeddings from several text variables to several numeric or categorical variables.

Usage

textTrainLists(
  x,
  y,
  force_train_method = "automatic",
  save_output = "all",
  method_cor = "pearson",
  eval_measure = "rmse",
  p_adjust_method = "holm",
  ...
)

Arguments

x

Word embeddings from textEmbed (or textEmbedLayerAggreation). It is possible to have word embeddings from one text variable and several numeric/categorical variables; or vice verse, word embeddings from several text variables to one numeric/categorical variable. It is not possible to mix numeric and categorical variables.

y

Tibble with several numeric or categorical variables to predict. Please note that you cannot mix numeric and categorical variables.

force_train_method

(character) Default is "automatic"; see also "regression" and "random_forest".

save_output

(character) Option not to save all output; default "all". See also "only_results" and "only_results_predictions".

method_cor

(character) A character string describing type of correlation (default "Pearson").

eval_measure

(character) Type of evaluative measure to assess models on (default "rmse").

p_adjust_method

Method to adjust/correct p-values for multiple comparisons. (default = "holm"; see also "none", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr").

...

Arguments from textTrainRegression or textTrainRandomForest (the textTrain function).

Value

Correlations between predicted and observed values (t-value, degree of freedom (df), p-value, confidence interval, alternative hypothesis, correlation coefficient) stored in a dataframe.

See Also

See textTrain, textTrainRegression and textTrainRandomForest.

Examples

# Examines how well the embeddings from Language_based_assessment_data_8 can
# predict the numerical numerical variables in Language_based_assessment_data_8.
# The training is done combination wise, i.e., correlations are tested pair wise,
# column: 1-5,1-6,2-5,2-6, resulting in a dataframe with four rows.

## Not run: 
word_embeddings <- word_embeddings_4$texts[1:2]
ratings_data <- Language_based_assessment_data_8[5:6]

trained_model <- textTrainLists(
  x = word_embeddings,
  y = ratings_data
)

# Examine results (t-value, degree of freedom (df), p-value,
# alternative-hypothesis, confidence interval, correlation coefficient).

trained_model$results

## End(Not run)


text documentation built on Sept. 11, 2024, 7:22 p.m.