knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
‘semdistflow’ transforms any user-specified text into sequential bigrams (e.g. ‘The dog drinks the milk’ to dog-drink, drink-milk, etc.). The package computes two measures of semantic distance for every running bigram in a language transcript. Users have many options for how to structure their texts and tailor the output to their own unique constraints (e.g., omitting stopwords, lemmatizing tokens, dimensionality of word embeddings).
Install semdistflow from GitHub by typing the following in your console or script (make sure you have devtools installed):
# install.packages("devtools") devtools::install_github("Reilly-ConceptsCognitionLab/semdistflow")
This is a basic example which shows you how the cleanme function works:
library(semdistflow) library(tidyverse) doc_id <- "fox" doc_text <- "The quick brown fox jumps over the lazy dog." fox_text <-as.data.frame(cbind(doc_id,doc_text)) fox_text
fox_clean <- cleanme(fox_text) fox_clean
This is a basic example which shows you how the cleanme function works:
fox_token <-fox_clean %>% group_by(doc_id, doc_text) %>% tidytext::unnest_tokens(word, doc_clean, drop=F) fox_token$lemma<- textstem::lemmatize_words(fox_token$word) fox_token
fox_dist <- bigram_cos_sim(targetdf = fox_token, lookupdb = semdist15, colname1 = lemma, colname2 = word, flipped = T) fox_dist
ggplot(fox_dist, aes(x=as.numeric(row.names(fox_dist)), y=flipped_cosine.dist)) + geom_line(color="#02401BD9", size= 1) + theme_classic() + xlab(NULL) + ylab(NULL) + geom_label(aes(label=pair), size=3, data=fox_dist)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.