textTrainN: Cross-validated accuracies across sample-sizes
In text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

textTrainN

R Documentation

Cross-validated accuracies across sample-sizes

Description

textTrainN() computes cross-validated correlations for different sample-sizes of a data set. The cross-validation process can be repeated several times to enhance the reliability of the evaluation.

Usage

textTrainN(
  x,
  y,
  sample_percents = c(25, 50, 75, 100),
  handle_word_embeddings = "individually",
  n_cross_val = 1,
  sampling_strategy = "subsets",
  use_same_penalty_mixture = TRUE,
  model = "regression",
  penalty = 10^seq(-16, 16),
  mixture = c(0),
  seed = 2024,
  ...
)

Arguments

`x`	Word embeddings from textEmbed (or textEmbedLayerAggregation). If several word embedding are provided in a list they will be concatenated.
`y`	Numeric variable to predict.
`sample_percents`	(numeric) Numeric vector that specifies the percentages of the total number of data points to include in each sample (default = c(25,50,75,100), i.e., correlations are evaluated for 25 each new sample.
`handle_word_embeddings`	Determine whether to use a list of word embeddings or an individual word_embedding (default = "individually", also "concatenate"). If a list of word embeddings are provided, then they will be concatenated.
`n_cross_val`	(numeric) Value that determines the number of times to repeat the cross-validation (i.e., number of tests). (default = 1, i.e., cross-validation is only performed once). Warning: The training process gets proportionately slower to the number of cross-validations, resulting in a time complexity that increases with a factor of n (n cross-validations).
`sampling_strategy`	Sample a "random" sample for each subset from all data or sample a "subset" from the larger subsets (i.e., each subset contain the same data).
`use_same_penalty_mixture`	If TRUE it only searches the penalty and mixture search grid once, and then use the same thereafter; if FALSE, it searches the grid every time.
`model`	Type of model. Default is "regression"; see also "logistic" and "multinomial" for classification.
`penalty`	(numeric) Hyper parameter that is tuned (default = 10^seq(-16,16)).
`mixture`	A number between 0 and 1 (inclusive) that reflects the proportion of L1 regularization (i.e. lasso) in the model (for more information see the linear_reg-function in the parsnip-package). When mixture = 1, it is a pure lasso model while mixture = 0 indicates that ridge regression is being used (specific engines only).
`seed`	(numeric) Set different seed (default = 2024).
`...`	Additional parameters from textTrainRegression.

Value

A tibble containing correlations for each sample. If n_cross_val > 1, correlations for each new cross-validation, along with standard-deviation, mean and standard error of correlation is included in the tibble. The information in the tibble is visualised via the textTrainNPlot function.

Examples

# Compute correlations for 25%, 50%, 75% and 100% of the data in word_embeddings and perform
# cross-validation thrice.

## Not run: 
tibble_to_plot <- textTrainN(
  x = word_embeddings_4$texts$harmonytext,
  y = Language_based_assessment_data_8$hilstotal,
  sample_percents = c(25, 50, 75, 100),
  n_cross_val = 3
)

# tibble_to_plot contains correlation-coefficients for each cross_validation and
# standard deviation and mean value for each sample. The tibble can be plotted
# using the testTrainNPlot function.

# Examine tibble
tibble_to_plot

## End(Not run)

text documentation built on Feb. 16, 2026, 5:10 p.m.

text index

README.md Creating a Singularity Container to Run HuggingFace Transformers Models in R Extended Installation Guide Getting started How to best manage computationally heavy analyses HuggingFace language models are downloaded in .cache HuggingFace Transformers in R: Word Embeddings Defaults and Specifications Implicit Motives Tutorial Installing and Managing Python Environments with `reticulate` L-BAM Tutorial Pre-registration and Researcher Degrees of Freedom Psychological Methods: the Text Tutorial The Language-Based Assessment Model (L-BAM) Library

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

text
Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

textTrainN: Cross-validated accuracies across sample-sizes
In text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Cross-validated accuracies across sample-sizes

Description

Usage

Arguments

Value

See Also

Examples

Related to textTrainN in text...

R Package Documentation

Browse R Packages

We want your feedback!

text Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

textTrainN: Cross-validated accuracies across sample-sizes In text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Cross-validated accuracies across sample-sizes

Description

Usage

Arguments

Value

See Also

Examples

Related to textTrainN in text...

R Package Documentation

Browse R Packages

We want your feedback!

text
Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

textTrainN: Cross-validated accuracies across sample-sizes
In text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning