model_bert: Construct a BERT Model

model_bertR Documentation

Construct a BERT Model

Description

BERT models are the family of transformer models popularized by Google's BERT (Bidirectional Encoder Representations from Transformers). They include any model with the same general structure.

Usage

model_bert(
  embedding_size,
  intermediate_size = 4 * embedding_size,
  n_layer,
  n_head,
  hidden_dropout = 0.1,
  attention_dropout = 0.1,
  max_position_embeddings,
  vocab_size,
  token_type_vocab_size = 2L
)

Arguments

embedding_size

Integer; the dimension of the embedding vectors.

intermediate_size

Integer; size of dense layers applied after attention mechanism.

n_layer

Integer; the number of attention layers.

n_head

Integer; the number of attention heads per layer.

hidden_dropout

Numeric; the dropout probability to apply to dense layers.

attention_dropout

Numeric; the dropout probability to apply in attention.

max_position_embeddings

Integer; maximum number of tokens in each input sequence.

vocab_size

Integer; number of tokens in vocabulary.

token_type_vocab_size

Integer; number of input segments that the model will recognize. (Two for BERT models.)

Shape

Inputs:

With sequence_length <= max_position_embeddings:

  • token_ids: (*, sequence_length)

  • token_type_ids: (*, sequence_length)

Output:

  • initial_embeddings: (*, sequence_length, embedding_size)

  • output_embeddings: list of (*, sequence_length, embedding_size) for each transformer layer.

  • attention_weights: list of (*, n_head, sequence_length, sequence_length) for each transformer layer.

Examples

emb_size <- 128L
mpe <- 512L
n_head <- 4L
n_layer <- 6L
vocab_size <- 30522L
model <- model_bert(
  embedding_size = emb_size,
  n_layer = n_layer,
  n_head = n_head,
  max_position_embeddings = mpe,
  vocab_size = vocab_size
)

n_inputs <- 2
n_token_max <- 128L
# get random "ids" for input
t_ids <- matrix(
  sample(
    2:vocab_size,
    size = n_token_max * n_inputs,
    replace = TRUE
  ),
  nrow = n_inputs, ncol = n_token_max
)
ttype_ids <- matrix(
  rep(1L, n_token_max * n_inputs),
  nrow = n_inputs, ncol = n_token_max
)
model(
  torch::torch_tensor(t_ids),
  torch::torch_tensor(ttype_ids)
)

macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.