dep_hive_query: dep_hive_query

Description Usage Arguments Details Value escaping handling our hadoop/hive setup See Also

Description

this is the "old" hive querying function - it's deprecated as all hell and waiting until Andrew sticks the hive server on a dedicated and more powerful machine.

Usage

1

Arguments

query

a query, or the location of a .hql file containing a query.

file

a file name. If this is provided, the results of the query will be written straight there, and a boolean TRUE returned. If not provided (it's NULL by default), the results of the query will be returned as a data.frame

dt

Whether to return it as a data.table or not.

...

other arguments to pass to read.delim.

Details

the deprecated hive querying function

Value

a data.frame containing the results of the query, or a boolean TRUE if the user has chosen to write straight to file.

escaping

hive_query works by running the query you provide through the CLI via a system() call. As a result, single escapes for meaningful characters (such as quotes) within the query will not work: R will interpret them only as escaping that character /within R/. Double escaping (\\) is thus necessary, in the same way that it is for regular expressions.

handling our hadoop/hive setup

The webrequests table is documented on Wikitech, which also provides a set of example queries.

When it comes to manipulating the rows with Java before they get to you, Nuria has written a brief tutorial on loading UDFs which should help if you want to engage in that; the example provided is a user agent parser, allowing you to get the equivalent of ua_parse's output further upstream.

See Also

log_strptime for converting the "dt" column in the webrequests table to POSIXlt, and mysql_query and global_query for querying our MySQL databases.


wikimedia-research/WMUtils documentation built on May 4, 2019, 5:23 a.m.