Description Usage Arguments Details Value See Also Examples
Given a url, or html session, find the absolute urls of relative and external links posted on the web page.
1 2 |
x |
Either a character string of website of interest, or a
session has defined from the rvest function
|
keep_regex |
a regular expression to be matched in the found hrefs, see details. |
omit_regex |
a regular expression to be matched in the found hrefs, see details. |
omit_bookmarks |
urls containing the "#" symbol will be omited from the
returned urls (Logical, defaults to |
... |
not currently used |
There are a few options for filtering the set of returned links:
keep_regex
, omit_regex
, and omit_bookmarks
. The first two
are regular expressions and will be applied to the set of links in order of
keep, then omit, that is: given a character vector of links
, the use
of the keep_regex
and omit_regex
is equivalent to the following
two lines of code:
> links <- links[grepl(keep_regex, links)]
> links <- links[!grepl(omit_regex, links)]
Both keep_regex
and omit_regex
are optional. You may consider
runing get_hrefs
without filting results and inspect the returned
urls. Post hoc filter would be viable, as would re-evaluating the
get_hrefs
call with the wanted filters.
By default urls with the '#' symbol are omitted. Set omit_bookmarks =
FALSE
to include url with bookmarks in the return.
A sna_hrefs
object, which is a data.frame
with the
following columns:
<chr> the found urls, modified to be absolute urls
<logical> indicates whether or not the url relative to the domain of x
The return object as additional attributes
<session> the html session
If the url or session does not resolve, the retruned data.frame
will
have the aforementioned columns, but will have no rows.
vignette(topic = "snaWeb", package = "snaWeb")
1 2 3 4 5 6 7 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.