query_hive: Query Hadoop cluster with Hive

Description Usage Arguments Value Escaping Handling our hadoop/hive setup See Also Examples

View source: R/hive.R

Description

Queries Hive

Usage

1
2
3
4
5
6
7
8
query_hive(
  query,
  heap_size = 1024,
  use_nice = TRUE,
  use_ionice = TRUE,
  use_beeline = FALSE,
  debug = FALSE
)

Arguments

query

a Hive query

heap_size

HADOOP_HEAPSIZE; default is 1024 (alt: 2048 or 4096)

use_nice

Whether to use nice for less greedy CPU usage in a multi-user environment. The default is TRUE.

use_ionice

Whether to use ionice for less greedy I/O in a multi-user environment. The default is TRUE.

use_beeline

Whether to use beeline to connect with Hive instead of hive. The default is FALSE.

debug

Whether to print the query and any messages/info which could be useful for debugging.

Value

A data.frame containing the results of the query, or a TRUE if the user has chosen to write straight to file.

Escaping

hive_query works by running the query you provide through the CLI via a system() call. As a result, single escapes for meaningful characters (such as quotes) within the query will not work: R will interpret them only as escaping that character /within R/. Double escaping (\\) is thus necessary, in the same way that it is for regular expressions.

Handling our hadoop/hive setup

The webrequests table is documented on Wikitech, which also provides a set of example queries. When it comes to manipulating the rows with Java before they get to you, Nuria has written a brief tutorial on loading UDFs which should help if you want to engage in that.

See Also

lubridate::ymd_hms() for converting the "dt" column in the webrequests table to proper datetime, and mysql_read() and global_query() for querying our MySQL databases

Examples

1
2
3
4
## Not run: 
query_hive("USE wmf; DESCRIBE webrequest;")

## End(Not run)

wikimedia/wikimedia-discovery-wmf documentation built on Feb. 7, 2021, 12:19 a.m.