# GloVe: Global Vectors In dselivanov/rsparse: Statistical Learning on Sparse Matrices

 GloVe R Documentation

## Global Vectors

### Description

Creates Global Vectors matrix factorization model

### Public fields

`components`

represents context embeddings

`bias_i`

bias term i as per paper

`bias_j`

bias term j as per paper

`shuffle`

`logical = FALSE` by default. Whether to perform shuffling before each SGD iteration. Generally shuffling is a good practice for SGD.

### Methods

#### Method `new()`

Creates GloVe model object

##### Usage
```GloVe\$new(
rank,
x_max,
learning_rate = 0.15,
alpha = 0.75,
lambda = 0,
shuffle = FALSE,
init = list(w_i = NULL, b_i = NULL, w_j = NULL, b_j = NULL)
)```
##### Arguments
`rank`

desired dimension for the latent vectors

`x_max`

`integer` maximum number of co-occurrences to use in the weighting function

`learning_rate`

`numeric` learning rate for SGD. I do not recommend that you modify this parameter, since AdaGrad will quickly adjust it to optimal

`alpha`

`numeric = 0.75` the alpha in weighting function formula : `f(x) = 1 if x > x_max; else (x/x_max)^alpha`

`lambda`

`numeric = 0.0` regularization parameter

`shuffle`

see `shuffle` field

`init`

`list(w_i = NULL, b_i = NULL, w_j = NULL, b_j = NULL)` initialization for embeddings (w_i, w_j) and biases (b_i, b_j). `w_i, w_j` - numeric matrices, should have #rows = rank, #columns = expected number of rows (w_i) / columns(w_j) in the input matrix. `b_i, b_j` = numeric vectors, should have length of #expected number of rows(b_i) / columns(b_j) in input matrix

#### Method `fit_transform()`

fits model and returns embeddings

##### Usage
```GloVe\$fit_transform(
x,
n_iter = 10L,
convergence_tol = -1,
...
)```
##### Arguments
`x`

An input term co-occurence matrix. Preferably in `dgTMatrix` format

`n_iter`

`integer` number of SGD iterations

`convergence_tol`

`numeric = -1` defines early stopping strategy. Stop fitting when one of two following conditions will be satisfied: (a) passed all iterations (b) ```cost_previous_iter / cost_current_iter - 1 < convergence_tol```.

`n_threads`

`...`

not used at the moment

#### Method `get_history()`

returns value of the loss function for each epoch

##### Usage
`GloVe\$get_history()`

#### Method `clone()`

The objects of this class are cloneable with this method.

##### Usage
`GloVe\$clone(deep = FALSE)`
##### Arguments
`deep`

Whether to make a deep clone.

### Examples

``````data('movielens100k')
co_occurence = crossprod(movielens100k)
glove_model = GloVe\$new(rank = 4, x_max = 10, learning_rate = .25)
embeddings = glove_model\$fit_transform(co_occurence, n_iter = 2, n_threads = 1)
embeddings = embeddings + t(glove_model\$components) # embeddings + context embedings
identical(dim(embeddings), c(ncol(movielens100k), 10L))
``````

dselivanov/rsparse documentation built on April 19, 2023, 11:11 p.m.