textPredict: Trained models created by e.g., textTrain() or stored on...

View source: R/2_4_textPredict.R

textPredictR Documentation

Trained models created by e.g., textTrain() or stored on e.g., github can be used to predict new scores or classes from embeddings or text using textPredict.

Description

Trained models created by e.g., textTrain() or stored on e.g., github can be used to predict new scores or classes from embeddings or text using textPredict.

Usage

textPredict(
  model_info = NULL,
  word_embeddings = NULL,
  texts = NULL,
  x_append = NULL,
  type = NULL,
  dim_names = TRUE,
  save_model = TRUE,
  threshold = NULL,
  show_texts = FALSE,
  device = "cpu",
  participant_id = NULL,
  save_embeddings = TRUE,
  save_dir = "wd",
  save_name = "textPredict",
  story_id = NULL,
  dataset_to_merge_predictions = NULL,
  previous_sentence = FALSE,
  ...
)

Arguments

model_info

(character or r-object) model_info has four options. 1: R model object (e.g, saved output from textTrainRegression). 2: Link to a model stored in a github repo (e.g, "https://github.com/CarlViggo/pretrained_swls_model/raw/main/trained_github_model_logistic.RDS"). 3: Link to a model stored in a osf project (e.g, https://osf.io/8fp7v). 4: Path to a model stored locally (e.g, "path/to/your/model"). Information about some accessible models can be found at: r-text.org.

word_embeddings

(tibble) Embeddings from e.g., textEmbed(). If you're using a pre-trained model, then texts and embeddings cannot be submitted simultaneously (default = NULL).

texts

(character) Text to predict. If this argument is specified, then arguments "word_embeddings" and "premade embeddings" cannot be defined (default = NULL).

x_append

(tibble) Variables to be appended after the word embeddings (x).

type

(character) Defines what output to give after logistic regression prediction. Either probabilities, classifications or both are returned (default = "class". For probabilities use "prob". For both use "class_prob").

dim_names

(boolean) Account for specific dimension names from textEmbed() (rather than generic names including Dim1, Dim2 etc.). If FALSE the models need to have been trained on word embeddings created with dim_names FALSE, so that embeddings were only called Dim1, Dim2 etc.

save_model

(boolean) The model will by default be saved in your work-directory (default = TRUE). If the model already exists in your work-directory, it will automatically be loaded from there.

threshold

(numeric) Determine threshold if you are using a logistic model (default = 0.5).

show_texts

(boolean) Show texts together with predictions (default = FALSE).

device

Name of device to use: 'cpu', 'gpu', 'gpu:k' or 'mps'/'mps:k' for MacOS, where k is a specific device number such as 'mps:1'.

participant_id

(list) Vector of participant-ids. Specify this for getting person level scores (i.e., summed sentence probabilities to the person level corrected for word count). (default = NULL)

save_embeddings

(boolean) If set to TRUE, embeddings will be saved with a unique identifier, and will be automatically opened next time textPredict is run with the same text. (default = TRUE)

save_dir

(character) Directory to save embeddings. (default = "wd" (i.e, work-directory))

save_name

(character) Name of the saved embeddings (will be combined with a unique identifier). (default = ""). Obs: If no save_name is provided, and model_info is a character, then save_name will be set to model_info.

story_id

(vector) Vector of story-ids. Specify this to get story level scores (i.e., summed sentence probabilities corrected for word count). When there is both story_id and participant_id indicated, the function returns a list including both story level and person level prediction corrected for word count. (default = NULL)

dataset_to_merge_predictions

(R-object, tibble) Insert your data here to integrate predictions to your dataset, (default = NULL).

previous_sentence

If set to TRUE, word-embeddings will be averaged over the current and previous sentence per story-id. For this, both participant-id and story-id must be specified.

...

Setting from stats::predict can be called.

Value

Predictions from word-embedding or text input.

See Also

See textTrain, textTrainLists and textTrainRandomForest.

Examples

## Not run: 

# Text data from Language_based_assessment_data_8
text_to_predict <- "I am not in harmony in my life as much as I would like to be."

# Example 1: (predict using pre-made embeddings and an R model-object)
prediction1 <- textPredict(
  model_info = trained_model,
  word_embeddings_4$texts$satisfactiontexts
)

# Example 2: (predict using a pretrained github model)
prediction2 <- textPredict(
  texts = text_to_predict,
  model_info = "https://github.com/CarlViggo/pretrained-models/raw/main/trained_hils_model.RDS"
)

# Example 3: (predict using a pretrained logistic github model and return
# probabilities and classifications)
prediction3 <- textPredict(
  texts = text_to_predict,
  model_info = "https://github.com/CarlViggo/pretrained-models/raw/main/
  trained_github_model_logistic.RDS",
  type = "class_prob",
  threshold = 0.7
)

# Example 4: (predict from texts using a pretrained model stored in an osf project)
prediction4 <- textPredict(
  texts = text_to_predict,
  model_info = "https://osf.io/8fp7v"
)
##### Automatic implicit motive coding section ######

# Create example dataset
implicit_motive_data <- dplyr::mutate(.data = Language_based_assessment_data_8,
participant_id = dplyr::row_number())

# Code implicit motives.
implicit_motives <- textPredict(
  texts = implicit_motive_data$satisfactiontexts,
  model_info = "implicit_power_roberta_large_L23_v1",
  participant_id = implicit_motive_data$participant_id,
  dataset_to_merge_predictions = implicit_motive_data
)

# Examine results
implicit_motives$sentence_predictions
implicit_motives$person_predictions

## End(Not run)

## Not run: 
# Examine the correlation between the predicted values and
# the Satisfaction with life scale score (pre-included in text).

psych::corr.test(
  predictions1$word_embeddings__ypred,
  Language_based_assessment_data_8$swlstotal
)

## End(Not run)

text documentation built on Sept. 11, 2024, 7:22 p.m.