Pulling x1000 pages

Onepetro starts the counting of records at 0 not 1.

my_url <- make_search_url(query = "shale oil", 
                          how = "all", 
                          dc_type = "conference-paper",
                          rows = 1000)

get_papers_count(my_url)
# 2578, dc_type = NULL
# 2109, dc_type = "conference-paper"
#  381, dc_type = "journal-paper"
onepetro_page_to_dataframe(my_url)

We get 1000 rows of papers indicating only the number of rows in the query. In this case, paper "Discussion" is record #1000

The next query should start at 1000.

my_url <- make_search_url(query = "shale oil", 
                          how = "all", 
                          dc_type = "conference-paper",
                          start = 1000,
                          rows  = 1000)

get_papers_count(my_url)
# 2578, dc_type = NULL
# 2578, dc_type = "conference-paper"
#  381, dc_type = "journal-paper"
onepetro_page_to_dataframe(my_url)

And the next x1000 page should start at 2000 as well. The size of the maximum page is rows=1000 always. This should give the rest 109 rows.

my_url <- make_search_url(query = "shale oil", 
                          how = "all", 
                          dc_type = "conference-paper",
                          start = 2000,
                          rows  = 1000)

get_papers_count(my_url)
# 2578, dc_type = NULL
# 2578, dc_type = "conference-paper"
#  381, dc_type = "journal-paper"
onepetro_page_to_dataframe(my_url)

document type: journal-paper

my_url <- make_search_url(query = "shale oil", 
                          how = "all",
                          dc_type = "journal-paper")

get_papers_count(my_url)
# 2577, dc_type = NULL
# 2109, dc_type = "conference-paper"
#  381, dc_type = "journal-paper"
# 
# total = 2577
my_url <- make_search_url(query = "shale oil", 
                          how = "all",
                          dc_type = "journal-paper",
                          rows  = 1000)

get_papers_count(my_url)
# 381
onepetro_page_to_dataframe(my_url)


f0nzie/petro.One documentation built on May 29, 2019, 12:05 a.m.