Home

/

CRAN

/

scrapeR

/

scrapeR_in_batches: Batch Web Page Content Scraper

scrapeR_in_batches: Batch Web Page Content Scraper
In scrapeR: These Functions Fetch and Extract Text Content from Specified Web Pages

View source: R/scrapeR.R

scrapeR_in_batches

R Documentation

Batch Web Page Content Scraper

Description

The scrapeR_in_batches function processes a dataframe in batches, scraping web content from URLs in a specified column and writing the scraped content to a column in df.

Usage

scrapeR_in_batches(df, url_column, extract_contacts)

Arguments

`df`	A dataframe containing the URLs to be scraped.
`url_column`	The name of the column in `df` that contains the URLs.
`extract_contacts`	A function that searches scraped content for emails and phone numbers, defaults to FALSE.

Details

This function divides the input dataframe into batches of a fixed size (default: 100). For each batch, it extracts the combined text content from the web pages of the URLs in the specified column. The results are appended to the df. The function also includes a throttling mechanism to pause between batch processing, reducing the load on the server being scraped.

Value

The values are returned to content column and optionally to an email and phone_number column if extract_contacts is TRUE.

Note

Ensure that the httr, rvest, and stringr packages are installed and loaded. Also, handle large datasets and output files with care to avoid memory issues.

Author(s)

Mathieu Dubeau Ph.D

References

Refer to rvest package documentation and httr package documentation for underlying web scraping methods.

Examples


  mock_scrapeR <- function(url) {
    return(paste("Scraped content from", url))
  }

  df <- data.frame(url = c("http://site1.com", "http://site2.com"), stringsAsFactors = FALSE)

  ## Not run: 
    scrapeR_in_batches(df, url_column = "url", extract_contacts = FALSE)
  
## End(Not run)

scrapeR documentation built on May 29, 2024, 10:01 a.m.

scrapeR index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

scrapeR
These Functions Fetch and Extract Text Content from Specified Web Pages

scrapeR_in_batches: Batch Web Page Content Scraper
In scrapeR: These Functions Fetch and Extract Text Content from Specified Web Pages

Batch Web Page Content Scraper

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to scrapeR_in_batches in scrapeR...

R Package Documentation

Browse R Packages

We want your feedback!

scrapeR These Functions Fetch and Extract Text Content from Specified Web Pages

scrapeR_in_batches: Batch Web Page Content Scraper In scrapeR: These Functions Fetch and Extract Text Content from Specified Web Pages

Batch Web Page Content Scraper

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to scrapeR_in_batches in scrapeR...

R Package Documentation

Browse R Packages

We want your feedback!

scrapeR
These Functions Fetch and Extract Text Content from Specified Web Pages

scrapeR_in_batches: Batch Web Page Content Scraper
In scrapeR: These Functions Fetch and Extract Text Content from Specified Web Pages