In bzuck-temple/TextDistanceBeta: semdistflow: semantic distance for natural language samples

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

semdistflow

Overview

‘semdistflow’ transforms any user-specified text into sequential bigrams (e.g. ‘The dog drinks the milk’ to dog-drink, drink-milk, etc.). The package computes two measures of semantic distance for every running bigram in a language transcript. Users have many options for how to structure their texts and tailor the output to their own unique constraints (e.g., omitting stopwords, lemmatizing tokens, dimensionality of word embeddings).

Installation

Install semdistflow from GitHub by typing the following in your console or script (make sure you have devtools installed):

# install.packages("devtools")
devtools::install_github("Reilly-ConceptsCognitionLab/semdistflow")

The main functions

readme() reads the txt file into R, appends a document id based on its filename and formats the text as a dataframe.
cleanme() uses many regular expressions to clean and format the text. These include omitting contractions, converting to lowercase, omitting numbers, omitting stopwords, etc.
distme() computes two metrics of semantic distance for each running pair of words in the language sample you just cleaned. These are outputted as a vector of word pairs.

Example of Cleaning Function

This is a basic example which shows you how the cleanme function works:

library(semdistflow)
library(tidyverse)
doc_id <- "fox"
doc_text <- "The quick brown fox jumps over the lazy dog."
fox_text <-as.data.frame(cbind(doc_id,doc_text))
fox_text

fox_clean <- cleanme(fox_text)
fox_clean

Example of Semantic Distance Function

This is a basic example which shows you how the cleanme function works:

fox_token <-fox_clean %>%
  group_by(doc_id, doc_text) %>%
  tidytext::unnest_tokens(word, doc_clean, drop=F)
  fox_token$lemma<- textstem::lemmatize_words(fox_token$word)
fox_token

fox_dist <-  bigram_cos_sim(targetdf = fox_token, lookupdb = semdist15, colname1 = lemma, colname2 = word, flipped = T)

fox_dist

ggplot(fox_dist, aes(x=as.numeric(row.names(fox_dist)), y=flipped_cosine.dist)) +  geom_line(color="#02401BD9", size= 1) + theme_classic() + xlab(NULL) + ylab(NULL)  + geom_label(aes(label=pair), size=3, data=fox_dist)

bzuck-temple/TextDistanceBeta documentation built on Jan. 29, 2023, 6:37 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bzuck-temple/TextDistanceBeta
semdistflow: semantic distance for natural language samples

In bzuck-temple/TextDistanceBeta: semdistflow: semantic distance for natural language samples

semdistflow

Overview

Installation

The main functions

Example of Cleaning Function

Example of Semantic Distance Function

R Package Documentation

Browse R Packages

We want your feedback!

bzuck-temple/TextDistanceBeta semdistflow: semantic distance for natural language samples

In bzuck-temple/TextDistanceBeta: semdistflow: semantic distance for natural language samples

semdistflow

Overview

Installation

The main functions

Example of Cleaning Function

Example of Semantic Distance Function

R Package Documentation

Browse R Packages

We want your feedback!

bzuck-temple/TextDistanceBeta
semdistflow: semantic distance for natural language samples