An R package to help detect linkrot, which is when links to a web page break because they’ve been taken down or moved.
Very much a concept. I wrote it to detect linkrot on my personal blog and it works for my needs. Feel free to contribute.
This package is only available on GitHub. Install from an R session with:
install.packages("remotes")
remotes::install_github("matt-dray/linkrot")
Pass a webpage URL to detect_rot()
and get a tibble with each link on
that page and what its response status
code is
(ideally we want 200
).
Here’s a check on one of my older blog posts. The printout tells you the URL you’re looking at, with a period printed for each successful check.
library(linkrot)
page <- "https://www.rostrum.blog/2018/04/14/r-trek-exploring-stardates/"
rot_page <- detect_rot(page)
#> Checking <https://www.rostrum.blog/2018/04/14/r-trek-exploring-stardates/> ..............................
rot_page
#> # A tibble: 30 x 6
#> page link_url link_text response_code response_catego… response_success
#> <chr> <chr> <chr> <dbl> <chr> <lgl>
#> 1 https:… https://ww… R statis… 200 Success TRUE
#> 2 https:… https://en… Star Tre… 200 Success TRUE
#> 3 https:… http://www… Star Tre… 200 Success TRUE
#> 4 https:… https://gi… regex 200 Success TRUE
#> 5 https:… http://vit… tidy 200 Success TRUE
#> 6 https:… https://en… Wikipedia 200 Success TRUE
#> 7 https:… http://sel… Selector… 200 Success TRUE
#> 8 https:… https://cr… how-to v… 404 Client error FALSE
#> 9 https:… https://ww… htmlwidg… 200 Success TRUE
#> 10 https:… https://gi… ggsci 200 Success TRUE
#> # … with 20 more rows
Uh oh, at least one is broken: it has a response_code
of 404
.
You could iterate over multiple pages with {purrr}:
pages <- c(
"https://www.rostrum.blog/2018/04/14/r-trek-exploring-stardates/",
"https://www.rostrum.blog/2018/04/27/two-dogs-in-toilet-elderly-lady-involved/",
"https://www.rostrum.blog/2018/05/19/pokeballs-in-super-smash-bros/"
)
library(purrr)
rot_pages <- set_names(map(pages, detect_rot), basename(pages))
#> Checking <https://www.rostrum.blog/2018/04/14/r-trek-exploring-stardates/> ..............................
#> Checking <https://www.rostrum.blog/2018/04/27/two-dogs-in-toilet-elderly-lady-involved/> ........................................
#> Checking <https://www.rostrum.blog/2018/05/19/pokeballs-in-super-smash-bros/> .....................
rot_pages
#> $`r-trek-exploring-stardates`
#> # A tibble: 30 x 6
#> page link_url link_text response_code response_catego… response_success
#> <chr> <chr> <chr> <dbl> <chr> <lgl>
#> 1 https:… https://ww… R statis… 200 Success TRUE
#> 2 https:… https://en… Star Tre… 200 Success TRUE
#> 3 https:… http://www… Star Tre… 200 Success TRUE
#> 4 https:… https://gi… regex 200 Success TRUE
#> 5 https:… http://vit… tidy 200 Success TRUE
#> 6 https:… https://en… Wikipedia 200 Success TRUE
#> 7 https:… http://sel… Selector… 200 Success TRUE
#> 8 https:… https://cr… how-to v… 404 Client error FALSE
#> 9 https:… https://ww… htmlwidg… 200 Success TRUE
#> 10 https:… https://gi… ggsci 200 Success TRUE
#> # … with 20 more rows
#>
#> $`two-dogs-in-toilet-elderly-lady-involved`
#> # A tibble: 40 x 6
#> page link_url link_text response_code response_catego… response_success
#> <chr> <chr> <chr> <dbl> <chr> <lgl>
#> 1 https:/… https://w… @mattdray 200 Success TRUE
#> 2 https:/… https://d… the Lond… 200 Success TRUE
#> 3 https:/… https://g… the sf p… 200 Success TRUE
#> 4 https:/… https://r… interact… 200 Success TRUE
#> 5 https:/… https://e… eastings… 200 Success TRUE
#> 6 https:/… https://e… latitude 200 Success TRUE
#> 7 https:/… https://e… longitude 200 Success TRUE
#> 8 https:/… https://r… leaflet 200 Success TRUE
#> 9 https:/… https://w… R 200 Success TRUE
#> 10 https:/… https://g… sf (‘sim… 200 Success TRUE
#> # … with 30 more rows
#>
#> $`pokeballs-in-super-smash-bros`
#> # A tibble: 21 x 6
#> page link_url link_text response_code response_catego… response_success
#> <chr> <chr> <chr> <dbl> <chr> <lgl>
#> 1 https:… https://en… Super Sm… 200 Success TRUE
#> 2 https:… https://en… Super Sm… 400 Client error FALSE
#> 3 https:… https://en… SSB Mele… 200 Success TRUE
#> 4 https:… https://en… SSB Braw… 200 Success TRUE
#> 5 https:… https://en… SSB ‘4’,… 200 Success TRUE
#> 6 https:… https://ww… a series… 200 Success TRUE
#> 7 https:… https://en… the Supe… 200 Success TRUE
#> 8 https:… https://en… Zelda 200 Success TRUE
#> 9 https:… https://en… EarthBou… 200 Success TRUE
#> 10 https:… https://en… the Poké… 400 Client error FALSE
#> # … with 11 more rows
Uh-oh, more broken links.
Please note that the {linkrot} project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.