knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE )
library(ref2014)
There were 36 Units of Assessment (UoAs) in the REF.
The names of each of these subject areas are stored in the vector UoA
.
data(UoA) cat(paste('1.', UoA), sep = '\n')
This makes it easy to look up the ID of a particular unit of assessment.
which(UoA == 'Mathematical Sciences') grep('Math', UoA)
Or to select several related subject areas.
grep('Engineering', UoA, value = TRUE)
library(dplyr) outputs <- tidy_outputs() maths_results <- subset(results, `Unit of assessment number` == 10) maths_outputs <- subset(outputs, uoa_number == 10) %>% mutate(row_id = row_number())
Key features in the submission data include the titles and DOIs of the research outputs.
In the mathematical sciences data, there r round(100 * mean(!is.na(maths_outputs$doi)), 2)
% of entries contain DOIs and r round(100 * mean(!is.na(maths_outputs$issn)), 2)
have ISSNs.
## If any two entries share an ISSN, DOI or (standardised) volume title, ## then they MUST be collected into the same journal. # maths_long <- maths_outputs %>% # select(row_id, doi, volume_std, issn, isbn) %>% # tidyr::gather('identifier', 'value', -row_id) %>% # filter(!is.na(value)) # Important to avoid grouping by missingness # library(igraph) # connected_components <- maths_long %>% # select(-identifier) %>% # graph_from_data_frame %>% # clusters %>% membership %>% stack %>% # rename(journal_id = values, row_id = ind) # maths_outputs2 <- maths_outputs %>% # merge(connected_components, by = 'row_id', all.x = TRUE) maths_outputs2 <- cluster_outputs_by_journals(maths_outputs) warning('Current implementation is fragile wrt column names')
After grouping submissions into journals, we have reduced the dimensionality of the data from r nrow(maths_outputs)
papers to r n_distinct(maths_outputs2$journal_id)
journals.
There are r sum(is.na(maths_outputs2$journal_id))
submissions (r round(100 * mean(is.na(maths_outputs2$journal_id)), 2)
%) that could not be assigned to a journal as they were missing volume title, DOI, ISSN and ISBN.
Of these unassigned submissions, r with(maths_outputs2, sum(is.na(journal_id) & output_type == 'D'))
were classified as journal articles.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.