knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  warning=FALSE, 
  message=FALSE
)

gpt2samples

The goal of gpt2samples is to help users explore the various sample texts as generated by Open AI's new GPT2 transformer based language model.

An original implementation of a smaller version of GPT-2 can be found here, and the original sample text files can be found here.

Data

This package contains the following data, stored as tibbles:

|tibble |description | |:--------------------|:-------------------------------------------------------------------------------------------------------------------------------------| |conditional-t07 |Conditionally generated samples, with context prompts from WebText test corpus, default settings (temperature 1 and no truncation). | |conditional-topk40 |Conditionally generated samples, with context prompts from WebText test corpus, with temperature 0.7 | |conditional |Conditionally generated samples, with context prompts from WebText test corpus, with truncation and top_k 40. | |unconditional |Unconditionally generated samples, default settings. | |unconditional-t07 |Unconditionally generated samples, with temperature 0.7 | |unconditional-topk40 |Unconditionally generated samples, with truncation and top_k 40.

Additionally, all the generated samples (conditional and unconditional) can be explored by calling all_samples().

Installation

You can install the released version of gpt2samples from GitHub with:

# install.packages("gpt2samples")
# install.packages("devtools")
devtools::install_github("kanishkamisra/gpt2samples")

Example

This is a basic example to explore the data using dplyr verbs

library(dplyr)
library(gpt2samples)

conditional %>%
  filter(id == 100)

unconditional_t07 %>%
  filter(id == 250)

all_samples() %>%
  filter(file == "conditional") %>%
  tail()

all_samples() %>%
  group_by(file) %>%
  summarise(total_lines = n())

Additional exploration can use Julia Silge and David Robinson's tidytext package, among others to analyze the generated text as produced by GPT-2.

Contributor Code of Conduct

Please note that the 'gpt2samples' project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.



kanishkamisra/gpt2samples documentation built on May 31, 2019, 10:34 a.m.