textEmbedReduce: Pre-trained dimension reduction (experimental)

View source: R/1_3_textEmbedReduce.R

textEmbedReduceR Documentation

Pre-trained dimension reduction (experimental)

Description

Pre-trained dimension reduction (experimental)

Usage

textEmbedReduce(
  embeddings,
  n_dim = NULL,
  scalar =
    "https://raw.githubusercontent.com/adithya8/ContextualEmbeddingDR/master/models/fb20/scalar.csv",
  pca =
    "https://raw.githubusercontent.com/adithya8/ContextualEmbeddingDR/master/models/fb20/rpca_roberta_768_D_20.csv"
)

Arguments

embeddings

(list) Embedding(s) - including, tokens, texts and/or word_types.

n_dim

(numeric) Number of dimensions to reduce to.

scalar

(string or matrix) Name or URL to scalar for standardizing the embeddings. If a URL, the function first examines whether it has been downloaded before. The string should be to a csv file containing a matrix with the pca weights for matrix multiplication. For more information see reference below.

pca

(string or matrix) Name or URL to pca weights. If a URL, the function first examines whether it has been downlaoded before. The string should be to a csv file containing a matrix. For more information see reference below.

Details

To use this method please see and cite:
Ganesan, A. V., Matero, M., Ravula, A. R., Vu, H., & Schwartz, H. A. (2021, June). Empirical evaluation of pre-trained transformers for human-level nlp: The role of sample size and dimensionality. In Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting (Vol. 2021, p. 4515). NIH Public Access.

See also Git-Hub Empirical-Evaluation

Value

Returns embeddings with reduced number of dimensions.

See Also

textEmbed

Examples

## Not run: 
embeddings <- textEmbedReduce(word_embeddings_4$texts)

## End(Not run)

text documentation built on Aug. 9, 2023, 5:08 p.m.