extract_urls_from_webpage: Extracts the urls from a webpage.

Description Usage Arguments Value Examples

Description

The function works simply by extracting the 'href“ attribute from all 'a' nodes. It is called internally from 'archiv.fromUrl' but can be useful as a separate function if you want to filter which links you archive.

Usage

1

Arguments

url

The url to extract urls.

except

A regular expression for URLs to exclude from extraction

Value

a vector of urls.

Examples

1
2
3
4
urlList <- extract_urls_from_webpage(
     "https://www-cs-faculty.stanford.edu/~knuth/retd.html",
     except="validator\\.w3\\.org"
     )

QualitativeDataRepository/archivr documentation built on Feb. 9, 2022, 8:32 p.m.