skipgrams: Generates skipgram word pairs.

Description Usage Arguments Details Value See Also

View source: R/preprocessing.R

Description

Generates skipgram word pairs.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
skipgrams(
  sequence,
  vocabulary_size,
  window_size = 4,
  negative_samples = 1,
  shuffle = TRUE,
  categorical = FALSE,
  sampling_table = NULL,
  seed = NULL
)

Arguments

sequence

A word sequence (sentence), encoded as a list of word indices (integers). If using a sampling_table, word indices are expected to match the rank of the words in a reference dataset (e.g. 10 would encode the 10-th most frequently occuring token). Note that index 0 is expected to be a non-word and will be skipped.

vocabulary_size

Int, maximum possible word index + 1

window_size

Int, size of sampling windows (technically half-window). The window of a word w_i will be [i-window_size, i+window_size+1]

negative_samples

float >= 0. 0 for no negative (i.e. random) samples. 1 for same number as positive samples.

shuffle

whether to shuffle the word couples before returning them.

categorical

bool. if FALSE, labels will be integers (eg. [0, 1, 1 .. ]), if TRUE labels will be categorical eg. [[1,0],[0,1],[0,1] .. ]

sampling_table

1D array of size vocabulary_size where the entry i encodes the probabibily to sample a word of rank i.

seed

Random seed

Details

This function transforms a list of word indexes (lists of integers) into lists of words of the form:

Read more about Skipgram in this gnomic paper by Mikolov et al.: Efficient Estimation of Word Representations in Vector Space

Value

List of couples, labels where:

See Also

Other text preprocessing: make_sampling_table(), pad_sequences(), text_hashing_trick(), text_one_hot(), text_to_word_sequence()


dfalbel/keras documentation built on Nov. 27, 2019, 8:16 p.m.