Titlebot

I string words together from the titles of scientific papers using Markov chains. Each word is sampled based on the probability that it follows the preceding word (i.e. I am a bigram model).

So far, I tweet about three kinds of titles: @EcologyTitles tweets about ecology based on PLOS titles. @ML_Titles tweets about machine learning based on ArXiV titles. * @AnswersInMarkov tweets about "creation science" based on articles published by Answers in Genesis' "peer-reviewed" "journal".

Additionally, @noamross thought it would be funny to create @HarrisBot, which tweets about whatever @davidjayharris tweets about. This repository contains a model based on @kara_woo's tweets as well.

In general, the machine learning titles are harder to distinguish from real titles, but the ecology titles can be much funnier (see below). Real "creation science" is, of course, indistinguishable for Markov chain output.

Praise for Titlebot:

Examples

devtools::load_all()
set.seed(0)

Machine learning:

ML_bigram = load_bigram("data/StatMLTitles")
replicate(5, generate_title(bigram = ML_bigram))

Ecology:

ecology_bigram = load_bigram("data/plos_ecology")
replicate(5, generate_title(bigram = ecology_bigram))

Answers Research Journal:

answers_bigram = load_bigram("data/Answers_Research_Journal")
replicate(5, generate_title(bigram = answers_bigram))

davidjayharris

harris_bigram = load_bigram("data/davidjayharris")
replicate(5, generate_title(bigram = harris_bigram))

kara_woo

woo_bigram = load_bigram("data/kara_woo")
replicate(5, generate_title(bigram = woo_bigram))

Licensing

The code is available under The Artistic License 2.0 (see LICENSE).

The machine learning titles in the "data" folder were scraped by Philippe (@PhDP) from ArXiv and are available under a Creative Commons Share Alike license (some of them are CC-BY).

The ecology titles were scraped from PLOS journals using rplos. These titles are all CC-BY.

The Answers titles are copyrighted by Answers In Genesis. Their inclusion and transformation is not an infringement of copyright in the United States, however, as they are covered by the fair use doctrine.

The HarrisBot data are @davidjayharrs's tweets, minus retweets. These are hereby released as CC-BY.

Kara Woo's tweets are used with her permission.



davharris/Titlebot documentation built on May 14, 2019, 9:27 p.m.