prep_word2vec: Prepare documents for word2Vec
In bmschmidt/wordVectors: Tools for creating and analyzing vector-space models of texts

prep_word2vec

R Documentation

Prepare documents for word2Vec

Description

This function exports a directory or document to a single file suitable to Word2Vec run on. That means a single, seekable txt file with tokens separated by spaces. (For example, punctuation is removed rather than attached to the end of words.) This function is extraordinarily inefficient: in most real-world cases, you'll be much better off preparing the documents using python, perl, awk, or any other scripting language that can reasonable read things in line-by-line.

Usage

prep_word2vec(origin, destination, lowercase = F, bundle_ngrams = 1, ...)

Arguments

`origin`	A text file or a directory of text files to be used in training the model
`destination`	The location for output text.
`lowercase`	Logical. Should uppercase characters be converted to lower?
`bundle_ngrams`	Integer. Statistically significant phrases of up to this many words will be joined with underscores: e.g., "United States" will usually be changed to "United_States" if it appears frequently in the corpus. This calls word2phrase once if bundle_ngrams is 2, twice if bundle_ngrams is 3, and so forth; see that function for more details.
`...`	Further arguments passed to word2phrase when bundle_ngrams is greater than 1.

Value

The file name (silently).

bmschmidt/wordVectors documentation built on June 2, 2022, 3:53 p.m.

bmschmidt/wordVectors index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bmschmidt/wordVectors
Tools for creating and analyzing vector-space models of texts

prep_word2vec: Prepare documents for word2Vec
In bmschmidt/wordVectors: Tools for creating and analyzing vector-space models of texts

Prepare documents for word2Vec

Description

Usage

Arguments

Value

Related to prep_word2vec in bmschmidt/wordVectors...

R Package Documentation

Browse R Packages

We want your feedback!

bmschmidt/wordVectors Tools for creating and analyzing vector-space models of texts

prep_word2vec: Prepare documents for word2Vec In bmschmidt/wordVectors: Tools for creating and analyzing vector-space models of texts

Prepare documents for word2Vec

Description

Usage

Arguments

Value

Related to prep_word2vec in bmschmidt/wordVectors...

R Package Documentation

Browse R Packages

We want your feedback!

bmschmidt/wordVectors
Tools for creating and analyzing vector-space models of texts

prep_word2vec: Prepare documents for word2Vec
In bmschmidt/wordVectors: Tools for creating and analyzing vector-space models of texts