log_sieve: log_sieve

Description Usage Arguments Details Value See Also

Description

Prototype pageviews filter for the Wikimedia request logs

Usage

1
log_sieve(log_data)

Arguments

log_data

an input data.frame or data.table. This should ideally be the output of sampled_logs or hive_query, since some of log_sieve's arguments are name-specific rather than indices-specific.

Details

log_sieve contains the prototype filter for "pageviews", as applicable to the Wikimedia request logs. It consumes logs, tags the "actual" pageviews, and returns them. While it's there, the XFFs are also passed through to the ip_address field, replacing those IPs. It's implemented in R, so the full definition can be seen just by printing log_sieve.

log_data, the first argument

Value

a data.table containing those rows of log_data that are pageviews.

See Also

codesampled_logs, to read from the sampled logs, or hive_query to read from the HDFS-based logs.


wikimedia-research/WMUtils documentation built on May 4, 2019, 5:23 a.m.