BertConfig | R Documentation |
Given a set of values as parameter inputs, construct a BertConfig object with those values.
BertConfig( vocab_size, hidden_size = 768L, num_hidden_layers = 12L, num_attention_heads = 12L, intermediate_size = 3072L, hidden_act = "gelu", hidden_dropout_prob = 0.1, attention_probs_dropout_prob = 0.1, max_position_embeddings = 512L, type_vocab_size = 16L, initializer_range = 0.02 )
vocab_size |
Integer; vocabulary size of |
hidden_size |
Integer; size of the encoder layers and the pooler layer. |
num_hidden_layers |
Integer; number of hidden layers in the Transformer encoder. |
num_attention_heads |
Integer; number of attention heads for each attention layer in the Transformer encoder. |
intermediate_size |
Integer; the size of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder. |
hidden_act |
The non-linear activation function (function or string) in the encoder and pooler. |
hidden_dropout_prob |
Numeric; the dropout probability for all fully connected layers in the embeddings, encoder, and pooler. |
attention_probs_dropout_prob |
Numeric; the dropout ratio for the attention probabilities. |
max_position_embeddings |
Integer; the maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048). |
type_vocab_size |
Integer; the vocabulary size of the
|
initializer_range |
Numeric; the stdev of the truncated_normal_initializer for initializing all weight matrices. |
An object of class BertConfig
## Not run: BertConfig(vocab_size = 30522L) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.