knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE )
library(RsemanticLibrarian)
This tutorial describes how to use the Semantic Librarian functions. Presntly, this package uses a corpus of abstracts from the APA database. A small sample of the required data files are automatically loaded with the package. These include:
article_df
a data frame containing abstract information (100 abstracts)article_vectors
a matrix containing the semantic vectors for each article. Each row contains the vector for each abstract. The rows for each abstract correspond to the row entries in article_df
author_list
a vector containing the first 100 author namesdictionary_words
a vector containing 100 words, much smaller than actual dictionaryWordVectors
a matrix containing the semantic vectors for each word in dictionary_words
.AuthorVectors
a matrix of semantic vectors for each authorThe entire data set can be downloaded from the github repository for the R Shiny Semantic Libararian app https://github.com/CrumpLab/SemanticLibrarian
Create some search terms:
search <- "president of psychology"
The get_search_terms()
function converts the words in the search string to lower case, removes special characters, and finds the words in the search string that are in the dictionary. Search terms must be in the dictionary, in order to use existing semantic vectors for searching the data base.
search <- get_search_terms(search,dictionary_words)
The get-search-article_similarities()
function returns the cosine similarity between the search terms and all of the abstracts in the data set. query_type
can be set to 1 for compound search, 2 for AND search, and 3 for OR search.
article_sims <- get_search_article_similarities(search_terms = search, d_words = dictionary_words, w_vectors = WordVectors, a_vectors = article_vectors, a_df = article_df, query_type = 1) knitr::kable(head(article_sims), format="html", escape=FALSE)
The get_mds_article_fits()
returns the most similar articles, up to the number specified by num_articles
. This function also conducts mult-dimensional scaling, to produce a 2-d renderable semantic space, with article positions in the space as x, y coordinates. Finally, the num_clusters
argument conducts k-means cluster analysis. The resulting data frame can be used for plotting, or to produce a table of the top articles.
MDS <- get_mds_article_fits(num_articles = 25, num_clusters = 3, year_range = c(1900,2000), input_df = article_sims, a_vectors=article_vectors )
Here is a simple plot of the abstract space using ggplot.
library(ggplot2) ggplot(MDS, aes(x=X,y=Y,color = cluster))+ geom_point()
Alternatively, plotly could be used to generate a plot. Here, hovering over the point displays the title of abstract.
library(plotly) ggp <- ggplot(MDS, aes(x= X, y= Y, color=as.factor(cluster), text=wrap_title))+ geom_hline(yintercept=0, color="grey")+ geom_vline(xintercept=0, color="grey")+ geom_point(aes(size=Similarity), alpha=.75)+ theme_void()+ theme(legend.position = "none") ax <- list( title = "", zeroline = TRUE, showline = FALSE, showticklabels = FALSE, showgrid = FALSE ) p <- ggplotly(ggp, tooltip="text", source = "article_SS_plot", hoverinfo="text") %>% layout(xaxis = ax, yaxis = ax,showlegend = FALSE) %>% #style(hoverinfo = 'title') %>% config(displayModeBar = F) %>% layout(xaxis=list(fixedrange=TRUE)) %>% layout(yaxis=list(fixedrange=TRUE)) p$elementId <- NULL p
The get_article_article_similarities()
function compares an existing abstract to all other abstracts in the data set, and returns a new data frame containing all of the abstracts with a column for Similarity between the target article and all other articles.
an_article <- article_df[1,]$title print(as.character(an_article)) sims <- get_article_article_similarities(an_article, article_vectors, article_df) # look at top 10 knitr::kable(head(sims), format="html", escape=FALSE)
All of the authors in the database are also coded as semantic vectors. Author vectors are the sum of the abstract vectors for each author. Use get_author_simimilarities()
to compute the similarity between one author and all other authors.
sims <- get_author_similarities(author_list[1]) # top 5 authors knitr::kable(head(sims), format="html", escape=FALSE)
Use the get_mds_author_fits()` function to create a dataframe with x,y coordinates for each author, based on multi-dimensional scaling
author_sims <- get_author_similarities(author_list[1]) author_MDS <- get_mds_author_fits(15,3,author_sims,AuthorVectors)
Using ggplot with text repel to view the author similarities in 2-D.
library(ggrepel) ggplot(author_MDS, aes(x=X, y=Y, color=as.factor(cluster), label=author, shape=selected_author))+ geom_hline(yintercept=0, color="grey")+ geom_vline(xintercept=0, color="grey")+ geom_point(aes(size=Similarity), alpha=.75)+ geom_text_repel(aes(size=Similarity),color="black", force=1.5)+ scale_size(range = c(4, 7))+ theme_void()+ theme(legend.position = "none")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.