ngramr: ngramr: Dig into the Google Ngram Viewer using R

ngramrR Documentation

ngramr: Dig into the Google Ngram Viewer using R

Description

The Google Books Ngram Viewer allows you to enter a list of phrases and then displays a graph showing how often the phrases have occurred in a corpus of books (e.g., "British English", "English Fiction", "French") over time. The underlying data is hidden in web page, embedded in some Javascript.

This package extracts the data an provides it in the form of an R dataframe.

The key function is ngram which, given a collection of phrases, returns a dataframe containing the frequencies by year.

The code is based on the getNgrams.py Python script available on Culturomics Code written by Jean-Baptiste Michel. The Culturomics website itself is worth exploring.

Note that compared to the 2009 versions, the 2012 and 2019 versions have larger numbers of books, improved OCR, improved library and publisher metadata. The 2012 and 2019 corpuses also don't form ngrams that cross sentence boundaries, and do form ngrams across page boundaries and support part of speech tagging, unlike the 2009 versions.

Like the Google Ngram Viewer website itself, this package is aimed at for quick inquiries into the usage of small sets of phrases.

Please respect the terms of service of the Google Books Ngram Viewer while using this code. This code is meant to help viewers retrieve data behind a few queries, not bang at Google's servers with dozens of queries. The complete dataset can be downloaded here.

Author(s)

Maintainer: Sean Carmody seancarmody@gmail.com [copyright holder]

References

Michel, Jean-Baptiste, et al. "Quantitative analysis of culture using millions of digitized books." Science 331, No. 6014 (2011): 176–182.

See Also

Useful links:


ngramr documentation built on Jan. 16, 2023, 5:07 p.m.

Related to ngramr in ngramr...