LinkNormalization: Link Normalization

Description Usage Arguments Value Author(s) Examples

Description

To normalize and transform URLs into a canonical form.

Usage

1
LinkNormalization(links, current)

Arguments

links

character, one or more URLs to Normalize.

current

character, The current page URL where links are located

Value

Vector of normalized urls

Author(s)

salim khalil

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Normalize a set of links

links<-c("http://www.twitter.com/share?url=http://glofile.com/page.html",
         "/finance/banks/page-2017.html",
         "./section/subscription.php",
         "//section/",
         "www.glofile.com/home/",
         "IndexEn.aspx",
         "glofile.com/sport/foot/page.html",
         "sub.glofile.com/index.php",
         "http://glofile.com/page.html#1",
         "?tags%5B%5D=votingrights&amp;sort=popular"
                   )

links<-LinkNormalization(links,"http://glofile.com" )

links

Rcrawler documentation built on May 2, 2019, 3:42 a.m.