UK REF Impact Case Studies

library(refimpact)

Introduction

This package is an API wrapper around the REF Impact Case Studies database API. Chances are that if you're looking at this package, you already know what this dataset is, and you probably know roughly what you're looking for.

If you have stumbled upon this package however, and you want to know more about the dataset, you can head here to find out more. If you are thinking of using this dataset as a toy dataset for learning, then you might find this dataset useful for text mining, amongst other things.

Core functions

The core function for this package is ref_get(), which takes an API method as the first argument, and some optional arguments depending on the method.

The API methods available are detailed below, but presented here for quick reference:

SearchCaseStudies

This is the core method of the API, and the most important for users of this package. The search method requires a compulsory argument to the ref_get() function: query. This argument takes a list of query parameters, which can be as simple as a single Case Study ID, which returns a single record. A query returning a single record is shown below to demonstrate the syntax and the returned data structure; more complex queries will be shown later in the vignette.

results <- ref_get("SearchCaseStudies", query=list(ID=941))
print(results)

You will note that the function returns a nested tibble - that is a tibble with other data frames inside it. This means that you can interrogate the tibble as per usual:

cat(results[[1, "CaseStudyId"]])
cat(results[[1, "Title"]])
cat(strtrim(results[[1, "ImpactSummary"]], width = 200), "<truncated>")
cat(strtrim(results[[1, "ImpactDetails"]], width = 200), "<truncated>")
cat(results[[1, "Institution"]])

You can also interrogate the nested fields the same way, and even subset them:

print(results[[1, "Country"]])
print(results[[1, "Institutions"]])
print(results[[1, "Institutions"]][,c("UKPRN", "InstitutionName")])

In the opinion of the package author, the nested tibble offers many advantages over other data representations - it is a relatively straight-forward exercise to transform the data into a set of wide or narrow tables if required.

Returning a single case study based on the ID is obviously a niche use-case, so there are some other ways to search the database. But before getting to those, it is worth pointing out that you can select multiple case studies in a single query:

results <- ref_get("SearchCaseStudies", query=list(ID=c(941, 942, 1014)))
print(results)

The ID parameter above is an exclusive parameter - if you provide one or more IDs then the function will print a warning to the console, and remove all parameters except for the IDs. This is based on the API's documented limitations.

The other parameters can all be combined for searching. Those parameters are:

Some examples are shown below.

results <- ref_get("SearchCaseStudies", query=list(UKPRN = 10007777))
dim(results)
results <- ref_get("SearchCaseStudies", query=list(UoA = 5))
dim(results)
results <- ref_get("SearchCaseStudies", query=list(tags = c(11280, 5085)))
dim(results)
results <- ref_get("SearchCaseStudies", query=list(phrase = "hello"))
dim(results)
results <- ref_get("SearchCaseStudies", query=list(UKPRN = 10007146,
                                                   UoA   = 3))
dim(results)

Unfortunately, the API method requires at least one search parameter, which makes it more difficult to download the entire dataset. A short script for this purpose is included at the end of this vignette.

Useful values for the UKPRN, UoA and tags parameters can be found by querying the other 4 API methods - the phrase parameter is the only parameter which can be used in isolation. Each of the 4 other API methods are outlined below.

ListInstitutions

This method lists all of the institutions which are included in the REF Impact Case Studies database, and the UKPRN column in the resuling tibble can be used as a query parameter

institutions <- ref_get("ListInstitutions")
print(institutions)

ListTagTypes and ListTagValues

These methods provide tags which can be used as search parameters in the SearchCaseStudies method. The ListTagTypes method returns the types of tags available:

tag_types <- ref_get("ListTagTypes")
print(tag_types)

These tag types can then be used as an argument to the ListTagValues method, to get all tags for each type:

tag_values_5 <- ref_get("ListTagValues", tag_type = 5)
print(tag_values_5)

This can take some time to iterate through, so the full table is bundled with this package. You can access it via ref_tags:

print(ref_tags)

ListUnitsOfAssessment

This method lists all of the units of assessment which the Impact Case Studies can be assessed against. The tibble also includes an ID column which can be used when querying the SearchCaseStudies method.

UoAs <- ref_get("ListUnitsOfAssessment")
print(UoAs)

Extracting the entire dataset

As alluded to above, the API cannot be searched without parameters, which means that downloading the entire dataset is not a simple task. The code below can be used to extract all records from the database.

uoa_table <- ref_get("ListUnitsOfAssessment")
uoa_list <- uoa_table$ID

ref_corpus <- vector(length = length(uoa_list), mode = "list")

for (i in seq_along(uoa_list)) {
  message("Retrieving data for UoA ", uoa_list[i])
  ref_corpus[[i]] <- ref_get("SearchCaseStudies", query = list(UoA = uoa_list[i]))
}

output <- do.call(rbind, ref_corpus)


Try the refimpact package in your browser

Any scripts or data that you put into this service are public.

refimpact documentation built on May 1, 2019, 8:47 p.m.