get_data: Get the data you need from qianxi.baidu.

Description Usage Arguments Value Examples

Description

This function invokes query calls based on the parameters you provide.

Usage

1
2
3
4
5
6
7
8
get_data(
  admin_codes,
  category = c("inflow", "outflow", "internal_flow"),
  url = "https://huiyan.baidu.com/migration",
  nap_control = function() runif(1L, 0.5, 1),
  verbose = FALSE,
  storage = tibble()
)

Arguments

admin_codes

a vector of administrative codes for each city that you want to query. See baidu_names for a full list of administrative codes.

category

one or more of the following values: "inflow", "outflow", and "internal_flow". Each of these values corresponds to one type of data on Baidu's website.

url

the master url of the website to be scraped. Default to Baidu Huiyan.

nap_control

controls the time interval between two query calls. You need this parameter to avoid being caught by any anti-bot mechanism.

verbose

whether or not to show additional information for a query call. Default to FALSE.

storage

to append the data received to an existing dataframe provided. if you provide this argument, then it becomes the default value that the function returns. Thus, even if a query call fails, your existing data will not be overwritten by a NULL value.

Value

A nested tibble with two columns data and meta. Each row stores the data received from one query call.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Query the inflow and outflow data for both Beijing ("110000") and Tianjin ("120000").
# The returned dataframe (of a tibble type) has a total of four rows.
df1 <- get_data(c("110000", "120000"), c("inflow", "outflow"))

## Not run: 
# Use an existing object/tibble to store your returned data.
storage_space <- df1
storage_space <- get_data("110000", "internal_flow", storage = storage_space)

## End(Not run)

sppmao/ioiscraper documentation built on Sept. 26, 2020, 2:45 p.m.