Thousands of Papers to Dataframe"
In petro.One: Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

The maximum number of rows that a OnePetro query can return is 1000. It means that the user could set up the query to return up to amximum of 1000 papers. Abover that number, the query to OnePetro will return error.

OnePetro has options to define the number of rows to display at 10, 50 and 100 rows. Additionally, through scripts like these, that number could be raised up to 1,000.

This article describes the process of reading multiple pages with thousand of papers to a unique dataframe.

Retrieve the most numerous paper by type

library(petro.One)

Perform a initial search

my_url <- make_search_url(query = "pressure transient analysis", 
                          how = "all")

get_papers_count(my_url)

What type of paper do we have?

papers_by_type(my_url)

For the tyme being we will retrieve only conference papers.

Collect first 1000 rows

# we use "conference-paper" only because other document types have
# different dataframe structure

my_url_1 <- make_search_url(query = "pressure transient analysis", 
                          how = "all", 
                          dc_type = "conference-paper",
                          start = 0,
                          rows  = 1000)

get_papers_count(my_url_1)

page_1 <- read_onepetro(my_url_1)
htm_1 <- "pta-01-conference.html"
xml2::write_html(page_1, file = htm_1)
onepetro_page_to_dataframe(htm_1)

Collect second set of 1000 rows

my_url_2 <- make_search_url(query = "pressure transient analysis", 
                          how = "all", 
                          dc_type = "conference-paper",
                          start = 1000,
                          rows  = 1000)

page_2 <- read_onepetro(my_url_2)
htm_2 <- "pta-02-conference.html"
xml2::write_html(page_2, file = htm_2)
onepetro_page_to_dataframe(htm_2)

Collect next set of 1000 rows

my_url_3 <- make_search_url(query = "pressure transient analysis", 
                          how = "all", 
                          dc_type = "conference-paper",
                          start = 2000,
                          rows  = 1000)

page_3 <- read_onepetro(my_url_3)
htm_3 <- "pta-03-conference.html"
xml2::write_html(page_3, file = htm_3)
onepetro_page_to_dataframe(htm_3)

Collect remaining set

my_url_4 <- make_search_url(query = "pressure transient analysis", 
                          how = "all", 
                          dc_type = "conference-paper",
                          start = 3000,
                          rows  = 100)

page_4 <- read_onepetro(my_url_4)
htm_4 <- "pta-04-conference.html"
xml2::write_html(page_4, file = htm_4)
onepetro_page_to_dataframe(htm_4)

Binding tables in one dataframe

p1 <- onepetro_page_to_dataframe(htm_1)
p2 <- onepetro_page_to_dataframe(htm_2)
p3 <- onepetro_page_to_dataframe(htm_3)
p4 <- onepetro_page_to_dataframe(htm_4)

papers <- rbind(p1, p2, p3, p4)
papers

Find which papers have the search word in the title

pattern <- "pressure transient analysis"
rows <- grep(pattern = pattern, papers$title_data, ignore.case = TRUE)
papers[rows, ]

# remove files that were created
files <- c(htm_1, htm_2, htm_3, htm_4)
file.remove(files)

Any scripts or data that you put into this service are public.

petro.One documentation built on May 2, 2019, 3:10 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

petro.One
Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Thousands of Papers to Dataframe"
In petro.One: Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Perform a initial search

What type of paper do we have?

Collect first 1000 rows

Collect second set of 1000 rows

Collect next set of 1000 rows

Collect remaining set

Binding tables in one dataframe

Find which papers have the search word in the title

Try the petro.One package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

petro.One Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Thousands of Papers to Dataframe" In petro.One: Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Perform a initial search

What type of paper do we have?

Collect first 1000 rows

Collect second set of 1000 rows

Collect next set of 1000 rows

Collect remaining set

Binding tables in one dataframe

Find which papers have the search word in the title

Try the petro.One package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

petro.One
Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata

Thousands of Papers to Dataframe"
In petro.One: Statistics and Text Mining for Oil and Gas Papers from OnePetro Metadata