getTextValuesFromPage: Web Scraper Function

Description Usage Arguments Examples

View source: R/search_functions.R

Description

This function allows you to scrape the text values within multiple elements, specified by the xpath 'path', on a particular website referenced by 'url'. This serves as a helper function to scrape sites relevant to Bioinformatics-based purposes, and as this package is developed I will add more higher-level functions that scrape from commonly used sites. But in the meantime (and I'm sure for a long time to come), this will be handy. If you are trying to scrape one particular element, see 'getTextValueFromPage'.

Usage

1

Arguments

url

The url corresponding to the website in question

path

The xpath associated with the elements we want to grab the text inside

Examples

1
2
3
4
5
# This gets the paper authors associated with a structure on RCSB
url <- "http://www.rcsb.org/structure/6B4V"
author_path <- '//*[@id="header_deposition-authors"]'
getAuthorNames <- partial(getTextValueFromPage, path=author_path)
cand_structs$Authors <- substr(getAuthorNames(url), 27, 200)

lacoperon/MDScraperTools documentation built on May 28, 2019, 12:59 p.m.