c_project_extractor: project_extractor

Description Usage Arguments Details Value See Also

Description

extracts language and project from a Wikimedia URL

Usage

1

Arguments

urls

a vector of URLs

Details

project_extractor takes Wikimedia URLs and extracts the language and project (for example, turning "https://en.wikipedia.org"" into "en.wikipedia"). It can handle both current and historic intermediary domains - zero, mobile, wap - and exclude them consistently.

Value

a vector of language and project names, or "Unknown" if a URL cannot be parsed. In the event that a URL can be parsed but you've been silly enough to pass it a non-Wikimedia URL, it will simply return nonsense.

See Also

host_handler for extracting hostnames generically.


wikimedia-research/WMUtils documentation built on May 4, 2019, 5:23 a.m.