normalize_url: Create a data frame that contains information about p nodes...

View source: R/normalize_url.R

normalize_urlR Documentation

Create a data frame that contains information about p nodes in every internal links in a given URL

Description

First, this function finds redirected url from a given url. This redirected url is used for subsequent parts of this function. It makes a list of internal links using LinkExtractor function in Rcrawler package. Then, information related to every p nodes in each internal link is scraped and combined in a data frame. This information include xpath, text, url, domain, and tag.

Usage

normalize_url(input.URL)

Arguments

input.URL

An URL for searching every p nodes in its internal links

Value

A data frame

Examples

input.URL <- "HTTP://GMFD.ORG/GMFRA/GMFRAINDEX.HTM"
get_p_node_data( input.URL )

Nonprofit-Open-Data-Collective/webscraper documentation built on July 19, 2023, 6:09 p.m.