sampled_logs: Retrieve data from the sampled logs

Description Usage Arguments Details Value Author(s) See Also

Description

sampled_logs reads in and parses data from the 1:1000 sampled RequestLogs on stat1002.

Usage

1

Arguments

file

either the full name of a sampled log file, or the year/month/day of the log file you want, provided as YYYYMMDD

Details

It does what it says on the tin; pass in a date (formatted as '20140601' or equivalent) and it will retrieve the sampled requestlogs for that day. One caveat worth noting is that the daily dumps are not truncated at precisely the stroke of midnight; for the example, you can expect to see some of the logs from 20140602 and be missing some from the 1st, which will be in 20140531. Slight fuzziness around date ranges may be necessary to get all the traffic you want.

It does not return all the fields from the log file, merely the most useful ones - namely timestamp, ip_address, status_code, url, mime_type, referer, x_forwarded, user_agent, lang and x_analytics.

Value

a data.table containing the useful columns from the sampled logs of the day you asked for.

Author(s)

Oliver Keyes <okeyes@wikimedia.org>

See Also

log_strptime for handling the log timestamp format, parse_uuids for parsing out app UUIDs from URLs, log_sieve for filtering the sampled logs to "pageviews", and hive_query for querying the unsampled RequestLogs.


wikimedia-research/WMUtils documentation built on May 4, 2019, 5:23 a.m.