In mcallaghan/scimetrix: Some Scientometric and Topic Modelling tools

Reading data from WoS

library(dplyr)
library(scimetrix)
library(tm)
library(topicmodels)
library(ggplot2)
path = system.file("results.txt",package="scimetrix")

Use the readWoS function to read in a text file downloaded from Web of Science, and apply the mergeOECD function to add OECD subject categories

papers <- readWoS(path) %>%
  mergeOECD() 

head(papers)

The paperNumbers function plots numbers of papers by year and another variable

paperNumbers(papers,"OECD",bSize=6)

paperShares works the same way but with shares instead of absolute numbers

paperShares(papers,"OECD",bSize=6)
paperShares(papers,"OECD",bSize=6,pType="line")

Preparing data for topic modelling

Turn a field of your dataframe (defaults to AB, abstract) into a corpus of documents

corpus <- corporate(papers)

Turn this into a document term matrix with a sparsity of 0.5 (this is a very low number, for illustration)

dtm <- makeDTM(corpus,0.5,papers$UT,0.05,0)

The above process removes some documents (a list of paper UTs is returned as $removed). In future operations, we will only want to use documents that were not removed

rem <- filter(papers,UT %in% dtm$removed)
papers_used <- subset(papers, !(UT %in% dtm$removed))

Re-create a corpus based on the words and documents used after the filtering steps above

corpus_used <- refresh_corp(dtm$dtm)

Topic modelling

What's the optimal number (up to a maximum of 10) of topics?

optimal_k(dtm$dtm, 10)

Run a topic model on the dtm, with k topics (smaller k = less computation time).

SEED <- 2016

system.time({
  CTM_3 = CTM(dtm$dtm,k=3,method="VEM",
               control=list(seed=SEED))
})

create a folder where we save a visualisation of the model, and the model data

visualise(CTM_3,corpus_used,dtm$dtm)

mcallaghan/scimetrix documentation built on May 22, 2019, 12:58 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mcallaghan/scimetrix
Some Scientometric and Topic Modelling tools

In mcallaghan/scimetrix: Some Scientometric and Topic Modelling tools

Reading data from WoS

Preparing data for topic modelling

Topic modelling

R Package Documentation

Browse R Packages

We want your feedback!

mcallaghan/scimetrix Some Scientometric and Topic Modelling tools

In mcallaghan/scimetrix: Some Scientometric and Topic Modelling tools

Reading data from WoS

Preparing data for topic modelling

Topic modelling

R Package Documentation

Browse R Packages

We want your feedback!

mcallaghan/scimetrix
Some Scientometric and Topic Modelling tools