extract_urls_from_text: Get the urls from a text file, .pdf file, .docx file, or...

Description Usage Arguments Value Examples

Description

'extract_urls_from_text' is called internally from 'archiv.fromText' but can be useful as a separate function if you want to filter which links you archive. Text extraction relies on the [readtext::readtext()] function from the package of the same name, so all file formats supported by 'readtext' are supported.

Usage

1

Arguments

fp

A filepath or string.

except

A regular expression for URLs to exclude from extraction

Value

a List of Urls.

Examples

1
2
3
4
## Not run: 
urlList <- extract_urls_from_text("textdoc.docx", except="doi\\.org\\/")

## End(Not run)

QualitativeDataRepository/archivr documentation built on Feb. 9, 2022, 8:32 p.m.