Comparing topics over time

knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
# The plotting chunks below need ggplot2 (a suggested package); skip them
# gracefully when it is not installed.
has_ggplot2 <- requireNamespace("ggplot2", quietly = TRUE)
library(scopusflow)

A common bibliometric question is not how large a literature is, but how its internal emphasis shifts over time. Within deep-learning research, say, is the share of work that also concerns medical imaging growing faster than the share about computer vision? scopus_compare_topics() answers exactly this, and plot_scopus_comparison() shows the answer. The comparison itself contacts the API, so it is shown but not run; the plotting is reproduced offline from an object of the same shape.

What the comparison measures

For each year and each comparison term, the function counts the records matching the reference topic and that term, and expresses it as a percentage of the records matching the reference alone. A value of 30% for "computer vision" in 2020 means that 30% of the deep-learning records that year also mention computer vision. The reference is the denominator, so it sits at 100% by construction and is not drawn.

cmp <- scopus_compare_topics(
  reference_query  = "deep learning",
  comparison_terms = c("computer vision", "natural language processing",
                       "medical imaging", "drug discovery"),
  years            = 2013:2021,
  field            = "TITLE-ABS-KEY"
)

The shape of the result

The result is a tidy table with one row per topic and year. We build one here with the same columns so the rest of the article runs without a key. The reference set grows over the period, which the uncertainty band will reflect.

years <- 2013:2021
ref_n <- round(seq(400, 1600, length.out = length(years)))
mk <- function(from, to) round(seq(from, to, length.out = length(years)))
counts <- list(
  "computer vision" = mk(140, 720),
  "natural language processing" = mk(90, 540),
  "medical imaging" = mk(15, 260),
  "drug discovery" = mk(8, 170)
)
cmp <- tibble::tibble(
  query = "q",
  query_type = c(rep("reference", length(years)),
                 rep("comparison", length(counts) * length(years))),
  abridged_query = c(rep("deep learning", length(years)),
                     rep(names(counts), each = length(years))),
  year = rep(years, length(counts) + 1),
  n = c(ref_n, unlist(counts, use.names = FALSE)),
  reference_n = rep(ref_n, length(counts) + 1),
  comparison_percentage = 100 * c(ref_n, unlist(counts, use.names = FALSE)) /
    rep(ref_n, length(counts) + 1),
  average_comparison_percentage = c(rep(100, length(years)),
                                    rep(c(40, 33, 15, 9), each = length(years)))
)
class(cmp) <- c("scopus_comparison", class(cmp))
cmp

The comparison_percentage column is the per-year share, and average_comparison_percentage is the same ratio computed over the whole period, which is what orders the topics. A year in which the reference has no records has no defined share and is recorded as NA rather than as a misleading zero.

A first plot

plot_scopus_comparison(cmp)

The chart uses whole-number year breaks, a colour-blind-safe palette and, because there are only a few topics, labels the lines directly so the reader need not match colours to a legend. Each label carries the topic's total record count. The shaded band around each line is a Wilson stability range: it is wide in the early years, when the reference set is small and the share would move easily, and narrows as the literature grows. Because 'Scopus' returns exact counts rather than a sample, the band is illustrative rather than a confidence interval, a point the plot_scopus_comparison() help page sets out.

Drawing the eye to one topic

When one topic is the focus of a figure, highlight draws it in an accent colour and greys the rest, which keeps the context visible without letting it compete.

plot_scopus_comparison(cmp, highlight = "medical imaging")

Adjusting the labels

The count suffix on each label can be turned off, and the uncertainty band can be removed, when a cleaner look is wanted.

plot_scopus_comparison(cmp, pub_count_in_legend = FALSE, interval = FALSE)

The return value is an ordinary ggplot2 object, so any further adjustment, a different theme or a saved file, is one + or one ggplot2::ggsave() away.

Reading the result as a table

Sometimes the numbers matter more than the picture. Because the output is a tibble, the usual tools apply: here are the topics ranked by their average share.

comp <- cmp[cmp$query_type == "comparison", ]
unique(comp[, c("abridged_query", "average_comparison_percentage")])


Try the scopusflow package in your browser

Any scripts or data that you put into this service are public.

scopusflow documentation built on June 20, 2026, 5:06 p.m.