View source: R/2_4_textPredict.R
textPredict | R Documentation |
Trained models created by e.g., textTrain() or stored on e.g., github can be used to predict new scores or classes from embeddings or text using textPredict.
textPredict(
model_info = NULL,
word_embeddings = NULL,
texts = NULL,
x_append = NULL,
type = NULL,
dim_names = TRUE,
save_model = TRUE,
threshold = NULL,
show_texts = FALSE,
device = "cpu",
participant_id = NULL,
save_embeddings = TRUE,
save_dir = "wd",
save_name = "textPredict",
story_id = NULL,
dataset_to_merge_predictions = NULL,
previous_sentence = FALSE,
...
)
model_info |
(character or r-object) model_info has four options. 1: R model object (e.g, saved output from textTrainRegression). 2: Link to a model stored in a github repo (e.g, "https://github.com/CarlViggo/pretrained_swls_model/raw/main/trained_github_model_logistic.RDS"). 3: Link to a model stored in a osf project (e.g, https://osf.io/8fp7v). 4: Path to a model stored locally (e.g, "path/to/your/model"). Information about some accessible models can be found at: r-text.org. |
word_embeddings |
(tibble) Embeddings from e.g., textEmbed(). If you're using a pre-trained model, then texts and embeddings cannot be submitted simultaneously (default = NULL). |
texts |
(character) Text to predict. If this argument is specified, then arguments "word_embeddings" and "premade embeddings" cannot be defined (default = NULL). |
x_append |
(tibble) Variables to be appended after the word embeddings (x). |
type |
(character) Defines what output to give after logistic regression prediction. Either probabilities, classifications or both are returned (default = "class". For probabilities use "prob". For both use "class_prob"). |
dim_names |
(boolean) Account for specific dimension names from textEmbed() (rather than generic names including Dim1, Dim2 etc.). If FALSE the models need to have been trained on word embeddings created with dim_names FALSE, so that embeddings were only called Dim1, Dim2 etc. |
save_model |
(boolean) The model will by default be saved in your work-directory (default = TRUE). If the model already exists in your work-directory, it will automatically be loaded from there. |
threshold |
(numeric) Determine threshold if you are using a logistic model (default = 0.5). |
show_texts |
(boolean) Show texts together with predictions (default = FALSE). |
device |
Name of device to use: 'cpu', 'gpu', 'gpu:k' or 'mps'/'mps:k' for MacOS, where k is a specific device number such as 'mps:1'. |
participant_id |
(list) Vector of participant-ids. Specify this for getting person level scores (i.e., summed sentence probabilities to the person level corrected for word count). (default = NULL) |
save_embeddings |
(boolean) If set to TRUE, embeddings will be saved with a unique identifier, and will be automatically opened next time textPredict is run with the same text. (default = TRUE) |
save_dir |
(character) Directory to save embeddings. (default = "wd" (i.e, work-directory)) |
save_name |
(character) Name of the saved embeddings (will be combined with a unique identifier). (default = ""). Obs: If no save_name is provided, and model_info is a character, then save_name will be set to model_info. |
story_id |
(vector) Vector of story-ids. Specify this to get story level scores (i.e., summed sentence probabilities corrected for word count). When there is both story_id and participant_id indicated, the function returns a list including both story level and person level prediction corrected for word count. (default = NULL) |
dataset_to_merge_predictions |
(R-object, tibble) Insert your data here to integrate predictions to your dataset, (default = NULL). |
previous_sentence |
If set to TRUE, word-embeddings will be averaged over the current and previous sentence per story-id. For this, both participant-id and story-id must be specified. |
... |
Setting from stats::predict can be called. |
Predictions from word-embedding or text input.
See textTrain
, textTrainLists
and
textTrainRandomForest
.
## Not run:
# Text data from Language_based_assessment_data_8
text_to_predict <- "I am not in harmony in my life as much as I would like to be."
# Example 1: (predict using pre-made embeddings and an R model-object)
prediction1 <- textPredict(
model_info = trained_model,
word_embeddings_4$texts$satisfactiontexts
)
# Example 2: (predict using a pretrained github model)
prediction2 <- textPredict(
texts = text_to_predict,
model_info = "https://github.com/CarlViggo/pretrained-models/raw/main/trained_hils_model.RDS"
)
# Example 3: (predict using a pretrained logistic github model and return
# probabilities and classifications)
prediction3 <- textPredict(
texts = text_to_predict,
model_info = "https://github.com/CarlViggo/pretrained-models/raw/main/
trained_github_model_logistic.RDS",
type = "class_prob",
threshold = 0.7
)
# Example 4: (predict from texts using a pretrained model stored in an osf project)
prediction4 <- textPredict(
texts = text_to_predict,
model_info = "https://osf.io/8fp7v"
)
##### Automatic implicit motive coding section ######
# Create example dataset
implicit_motive_data <- dplyr::mutate(.data = Language_based_assessment_data_8,
participant_id = dplyr::row_number())
# Code implicit motives.
implicit_motives <- textPredict(
texts = implicit_motive_data$satisfactiontexts,
model_info = "implicit_power_roberta_large_L23_v1",
participant_id = implicit_motive_data$participant_id,
dataset_to_merge_predictions = implicit_motive_data
)
# Examine results
implicit_motives$sentence_predictions
implicit_motives$person_predictions
## End(Not run)
## Not run:
# Examine the correlation between the predicted values and
# the Satisfaction with life scale score (pre-included in text).
psych::corr.test(
predictions1$word_embeddings__ypred,
Language_based_assessment_data_8$swlstotal
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.