download_BERT_checkpoint: Download a BERT checkpoint

View source: R/download_checkpoint.R

download_BERT_checkpointR Documentation

Download a BERT checkpoint

Description

Downloads the specified BERT checkpoint from the Google Research collection or other repositories.

Usage

download_BERT_checkpoint(
  model = c("bert_base_uncased", "bert_base_cased", "bert_large_uncased",
    "bert_large_cased", "bert_large_uncased_wwm", "bert_large_cased_wwm",
    "bert_base_multilingual_cased", "bert_base_chinese", "scibert_scivocab_uncased",
    "scibert_scivocab_cased", "scibert_basevocab_uncased", "scibert_basevocab_cased"),
  dir = NULL,
  url = NULL,
  force = FALSE,
  keep_archive = FALSE,
  archive_type = NULL
)

Arguments

model

Character vector. Which model checkpoint to download.

dir

Character vector. Destination directory for checkpoints. Leave NULL to allow RBERT to automatically choose a directory. The path is determined from the dir parameter if supplied, followed by the 'RBERT.dir' option (set using set_BERT_dir), followed by an "RBERT" folder in the user cache directory (determined using user_cache_dir). If you provide a dir, the 'RBERT.dir' option will be updated to that location. Note that the checkpoint will create a subdirectory inside this dir.

url

Character vector. An optional url from which to download a checkpoint. Overrides model parameter if not NULL.

force

Logical. Download even if the checkpoint already exists in the specified directory? Default FALSE.

keep_archive

Logical. Keep the zip (or other archive) file? Leave as FALSE to save space.

archive_type

How is the checkpoint archived? We currently support "zip" and "tar-gzip". Leave NULL to infer from the url.

Value

If successful, returns the path to the downloaded checkpoint.

Checkpoints

download_BERT_checkpoint knows about several pre-trained BERT checkpoints. You can specify these checkpoints using the model parameter. Alternatively, you can supply a direct url to any BERT tensorflow checkpoint.

model layers hidden heads parameters special
bert_base_* 12 768 12 110M
bert_large_* 24 1024 16 340M
bert_large_*_wwm 24 1024 16 340M whole word masking
bert_base_multilingual_cased 12 768 12 110M 104 languages
bert_base_chinese 12 768 12 110M Chinese Simplified and Traditional
scibert_scivocab_* 12 768 12 110M Trained using the full text of 1.14M scientific papers (18% computer science, 82% biomedical), with a science-specific vocabulary.
scibert_basevocab_uncased 12 768 12 110M As scibert_scivocab_*, but using the original BERT vocabulary.

Source

https://github.com/google-research/bert

https://github.com/allenai/scibert

Examples

## Not run: 
download_BERT_checkpoint("bert_base_uncased")
download_BERT_checkpoint("bert_large_uncased")
temp_dir <- tempdir()
download_BERT_checkpoint("bert_base_uncased", dir = temp_dir)

## End(Not run)

jonathanbratt/RBERT documentation built on Jan. 26, 2023, 4:15 p.m.