is_automata: identify non-webcrawler automated traffic

Description Usage Arguments Value See Also

View source: R/filter.R

Description

Not all automated traffic is from a webcrawler - much is from people running HTTP libraries in a particularly stupid, selfish and lazy fashion (if you're reading this and you've ever had a service making requests with the user agent "Twisted PageGetter": this means you). is_automata identifies this class of traffic.

Usage

1
is_automata(user_agents)

Arguments

user_agents

a vector of user agents, which can be retrieved with read_sampled_log.

Value

a boolean vector identifying whether the user agent at the equivalent indices in the input vector matched that of an automated service or not.

See Also

read_sampled_log for retrieving user agents, and is_automata for identifying non-crawler automata.


wikimedia-research/pageviews documentation built on May 4, 2019, 5:24 a.m.