R/generator-query-type.R

Defines functions check_generator is_generator_module is_generator_query new_generator_query.query new_generator_query.list new_generator_query.prop new_generator_query.generator new_generator_query list_all_generators query_generate_pages

Documented in list_all_generators new_generator_query query_generate_pages

#' Generate pages that meet certain criteria, or which are related to a set of
#' known pages by certain properties
#'
#' Many of the endpoints on the Action API can be used as `generators`. Use
#' [list_all_generators()] to see a complete list. The main advantage of using a
#' generator is that you can chain it with calls to [query_page_properties()] to
#' find out specific information about the pages. This is not possible for
#' queries constructed using [query_list_pages()].
#'
#' There are two kinds of `generator`: list-generators and prop-generators. If
#' using a prop-generator, then you need to use a [query_by_()] function to tell
#' the API where to start from, as shown in the examples.
#'
#' To set additional parameters to a generator, prepend the parameter with "g".
#' For instance, to set a limit of 10 to the number of pages returned by the
#' `categorymembers` generator, set the parameter `gcmlimit = 10`.
#'
#' @param .req A httr2_request, e.g. generated by `wiki_action_request`
#' @param generator The generator module you wish to use. Most
#'   [list](https://www.mediawiki.org/wiki/API:Lists) and
#'   [property](https://www.mediawiki.org/wiki/API:Properties) modules can be
#'   used, though not all.
#' @param ... <[`dynamic-dots`][rlang::dyn-dots]> Additional parameters to the
#'   generator
#'
#' @return [query_generate_pages]: The modified request, which can be passed to [next_batch] or
#'   [retrieve_all] as appropriate.
#'
#'   [list_all_generators]: a [tibble][tibble::tbl_df] of all the available generator
#'   modules. The `name` column gives the name of the generator, while the
#'   `group` column indicates whether the generator is based on a list module
#'   or a property module. Generators based on property modules can only be
#'   added to a query if you have already used [query_by_] to specify which
#'   pages' properties should be generated.
#' @export
#'
#' @seealso [gracefully()]
#'
#' @examples
#' # Search for articles about seagulls
#' seagulls <- wiki_action_request() %>%
#'   query_generate_pages("search", gsrsearch = "seagull") %>%
#'   gracefully(next_batch)
#'
#' seagulls
query_generate_pages <- function(.req, generator, ...) {
  group <- check_generator(generator)
  # TODO: check_params
  if (group == "prop" && !is_prop_query(.req)) {
    rlang::abort(
      glue::glue("{generator} is based on a 'property' endpoint; use `query_by_` to specify the starting pages before adding the generator to the query"),
      class = "malformed_generator"
    )
  }
  new_generator_query(.req, generator, ...)
}

#' @rdname query_generate_pages
#' @export
list_all_generators <- function() {
  schema_query_modules %>%
    dplyr::filter(generator == TRUE) %>%
    dplyr::select(name, group)
}

#' Constructor for generator query type
#'
#' Construct a new query to a [generator
#' module](https://www.mediawiki.org/wiki/API:Query#Example_6:_Generators) of
#' the Action API. This low-level constructor only performs basic type-checking.
#' It is your responsibility to ensure that the chosen `generator` is an
#' existing API endpoint, and that you have composed the query correctly. For
#' a more user-friendly interface, use [query_generate_pages].
#'
#' @param .req A [`query/action_api/httr2_request`][wiki_action_request] object,
#'   or a generator query as returned by this function.
#' @param generator The generator to add to the query. If the generator is based
#'   on a [property module](https://www.mediawiki.org/wiki/API:Properties), then
#'   `.req` must be a subtype of
#'   [`prop/query/action_api/httr2_request`][new_prop_query]. If the generator
#'   is based on a [list module](https://www.mediawiki.org/wiki/API:Lists), then
#'   `.req` must subclass
#'   [`query/action_api/httr2_request`][wiki_action_request] directly.
#' @param ... <[`dynamic-dots`][rlang::dyn-dots]> Further parameters to the generator
#'
#' @keywords low_level_action_api
#'
#' @return The output type depends on the input. If `.req` is a
#'   [`query/action_api/httr2_request`][wiki_action_request], then the output
#'   will be a `generator/query/action_api/httr2_request`. If `.req` is a
#'   [`prop/query/action_api/httr2_request`][new_prop_query], then the return
#'   object will be a subclass of the passed request, with "generator" as the
#'   first term in the class vector, i.e.
#'   `generator/(titles|pageids|revids)/prop/query/action_api/httr2_request`.
#' @export
#' @examples
#' # Build a generator query using a list module
#' # List all members of Category:Physics on English Wikipedia
#' physics <- wiki_action_request() %>%
#'   new_generator_query("categorymembers", gcmtitle = "Category:Physics")
#'
#' # Build a generator query on a property module
#' # Generate the pages that are linked to Albert Einstein's page on English
#' # Wikipedia
#' einstein_categories <- wiki_action_request() %>%
#'   new_prop_query("titles", "Albert Einstein") %>%
#'   new_generator_query("iwlinks")
#'
new_generator_query <- function(.req, generator, ...) {
  UseMethod("new_generator_query")
}

#' @export
new_generator_query.generator <- function(.req, generator, ...) {
  req <- set_action(.req, "generator", generator, ...)
  req
}

#' @export
new_generator_query.prop <- function(.req, generator, ...) {
  NextMethod()
}

#' @export
new_generator_query.list <- function(.req, generator, ...) {
  incompatible_query_error("generator", "list")
}

#' @export
new_generator_query.query <- function(.req, generator, ...) {
  req <- set_action(.req, "generator", generator, ...)
  class(req) <- c("generator", class(req))
  req
}

is_generator_query <- function(.req) {
  is_query_subtype(.req, "generator")
}

is_generator_module <- function(module) {
  result <- schema_query_modules %>%
    dplyr::filter(generator == TRUE) %>%
    dplyr::group_by(group) %>%
    dplyr::summarise(is_generator = module %in% name) %>%
    dplyr::filter(is_generator == TRUE)
  structure(
    rlang::is_true(result$is_generator),
    group = result$group
  )
}

check_generator <- function(module) {
  result <- is_generator_module(module)
  if (!result) {
    rlang::abort(
      glue::glue("`{module}` cannot be used as a generator with the Action API, though it may be valid as a property or list query"),
      class = "unknown_module_error"
    )
  } else {
    group <- attr(result, "group", exact = TRUE)
    invisible(group)
  }
}

Try the wikkitidy package in your browser

Any scripts or data that you put into this service are public.

wikkitidy documentation built on April 4, 2025, 12:41 a.m.