Home

/

GitHub

/

jsphdms/govscot

/

README.md

README.md
In jsphdms/govscot:

govscot

An R package to responsibly scrape gov.scot

Rationale

I am looking into how the impact of NRS statistics on Scottish Government policy might be measured. One approach is to search through content on gov.scot for mentions of certain phrases relevant to NRS, to see if patterns arise over time or between different directorates and topics.

There are a number of ways this could be achieved. Here is a summary of my views on the pros and cons of each.

Google search results should all contain the strings you're searching for. So there should be less data to analyse.

You're relying on Google search results which can change as Google's methodology changes. This doesn't return all pages (i.e. the ones without mentions) which means you don't have a denominator to create rates from.

This would return comprehensive and reasonably structured data which might avoid some of the messiness of scraping.

As far as I know there is no API for content on gov.scot

This would be a comprehensive dataset to analyse.

This would would only include web text (since I presume a zip file of all the supporting documents would be prohibitively large).

This seems to be the only way to search through supporting documents (at least machine readable ones).

Time needed to write a script.

Less time to set up.

These often cost money and may be time consuming to run on a regular basis.

I'm not sure what the pros are.

This could be technically challenging and there may be a cost involved. It might also be overkill.

jsphdms/govscot documentation built on May 3, 2019, 7:40 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jsphdms/govscot

README.md
In jsphdms/govscot:

govscot

Rationale

Option 1 - Scrape google search results

Pros

Cons

Option 2 - Query an API

Pros

Cons

Option 3 - Request an export of data from the content management system

Pros

Cons

Option 4 - Write a script (e.g. Rvest or Scrapy)

Pros

Cons

Option 5 - Use web crawling software (e.g. Screaming Frog)

Pros

Cons

Option 6 - Use Google custom search

Pros

Cons

R Package Documentation

Browse R Packages

We want your feedback!

jsphdms/govscot

README.md In jsphdms/govscot:

govscot

Rationale

Option 1 - Scrape google search results

Pros

Cons

Option 2 - Query an API

Pros

Cons

Option 3 - Request an export of data from the content management system

Pros

Cons

Option 4 - Write a script (e.g. Rvest or Scrapy)

Pros

Cons

Option 5 - Use web crawling software (e.g. Screaming Frog)

Pros

Cons

Option 6 - Use Google custom search

Pros

Cons

R Package Documentation

Browse R Packages

We want your feedback!

README.md
In jsphdms/govscot: