View source: R/preprocessing.R
skipgrams | R Documentation |
Generates skipgram word pairs.
skipgrams(
sequence,
vocabulary_size,
window_size = 4,
negative_samples = 1,
shuffle = TRUE,
categorical = FALSE,
sampling_table = NULL,
seed = NULL
)
sequence |
A word sequence (sentence), encoded as a list of word indices
(integers). If using a |
vocabulary_size |
Int, maximum possible word index + 1 |
window_size |
Int, size of sampling windows (technically half-window).
The window of a word |
negative_samples |
float >= 0. 0 for no negative (i.e. random) samples. 1 for same number as positive samples. |
shuffle |
whether to shuffle the word couples before returning them. |
categorical |
bool. if |
sampling_table |
1D array of size |
seed |
Random seed |
This function transforms a list of word indexes (lists of integers) into lists of words of the form:
(word, word in the same window), with label 1 (positive samples).
(word, random word from the vocabulary), with label 0 (negative samples).
Read more about Skipgram in this gnomic paper by Mikolov et al.: Efficient Estimation of Word Representations in Vector Space
List of couples
, labels
where:
couples
is a list of 2-element integer vectors: [word_index, other_word_index]
.
labels
is an integer vector of 0 and 1, where 1 indicates that other_word_index
was found in the same window as word_index
, and 0 indicates that other_word_index
was random.
if categorical
is set to TRUE
, the labels are categorical, ie. 1 becomes [0,1]
,
and 0 becomes [1, 0]
.
Other text preprocessing:
make_sampling_table()
,
pad_sequences()
,
text_hashing_trick()
,
text_one_hot()
,
text_to_word_sequence()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.