bookworm: Run a bookworm query

View source: R/bookworm.R

bookwormR Documentation

Run a bookworm query

Description

Get a data.frame from a remote Bookworm server

Usage

bookworm(
  host = NULL,
  port = 80,
  database = NULL,
  method = "data",
  format = "tsv",
  counttype = c("WordCount"),
  protocol = "https",
  compare_limits = NULL,
  groups = NULL,
  search_limits = RJSONIO::emptyNamedList,
  query = list(),
  ...
)

Arguments

host

The domain where the Bookworm lives.

port

The port that the API lives on; usually 80.

database

The name of the bookworm to use at the host.

method

The Bookworm API method to be used. Only "return_tsv" works particularly well at the moment. In post 2018 Bookworms, use "data".

format

The format type. tsv, json, feather. In post-2018 bookworms, use "tsv" here.

counttype

A list of summary statistics to calculate.

protocol

"https" or "http". Overridden if included in host.

compare_limits

The limits for a comparison. Takes the same format as search_limits. If NULL, none is passed and default behavior is applied. (Usually the best choice for non-experts.)

groups

A list of groupings to be applied.

search_limits

The search constraints, expressed as a list (see below).

query

The Bookworm query to be searched for, expressed as a list. Most of the above methods (query, groups, counttype, aesthetic, etc.) will update fields in the query, but it can also be called directly. If the query conflicts with any of them (has a different groups element, eg), currently the query object is the default.

...

Any additional named arguments will be passed to search_limits.

Details

This handles the dispatching, and some mild error-correcting, on sending a query to a bookworm server and retrieving the results. There are two ways to run the query: either by specifying the "query" object the most directly, or by filling in the methods one at a time. The latter is easier to understand, but the other is sometimes easier when dispatching calls programatically.

Value

A data_frame consisting of the results of the call.

Author(s)

Benjamin Schmidt

Examples


library(dplyr)
totals <- bookworm(
  host = "bookworm.htrc.illinois.edu",
  groups = "date_year", "counttype" = "TotalTexts",
  database = "Bookworm2016", method = "return_tsv"
)

plot(totals[totals$date_year %in% 1700:2000, ], type = "l", main = "Total works in the Hathi Trust Public Domain corpus")

totals <- bookworm(
  host = "bookworm.htrc.illinois.edu",
  groups = "date_year", "counttype" = c("WordCount", "TextCount"),
  database = "Bookworm2016", method="return_tsv"
)

totals <- totals[totals$date_year %in% 1700:2000, ]
plot(totals$date_year, totals$WordCount / totals$TextCount, type = "l", main = "Average length of works in the Hathi Trust Public Domain corpus")

results <- bookworm(
  host = "bookworm.htrc.illinois.edu", search_limits = list("word" = "evolution"),
  groups = "date_year", "counttype" = "WordsPerMillion",
  database = "Bookworm2016", method = "return_tsv"
)

plot(results[results$date_year %in% 1700:2000, ], main = "Usage of 'evolution' in the Hathi Trust Public Domain corpus")

bmschmidt/edinburgh documentation built on March 24, 2022, 1:24 a.m.