coRPysprofiling is an open-source library designed to bring exploratory data analysis and visualization to the domain of natural language processing. Functions in the package will be used to provide some elementary statistics and visualizations for a single text corpus or provide functions to compare multiple corpora with each other.
# install.packages("devtools")
devtools::install_github("UBC-MDS/coRPysprofiling-R")
Some specific functions include:
To our knowledge, while wordcloud
library generates wordcloud
visualization for a given corpus, there is no general-purpose library
for exploratory analysis and visualization of a text corpus in the R
ecosystem. There are several advanced libraries for comparing
similarities between different corpora: most notably, quanteda
provides similarity comparison between large corpora using word
embeddings. We believe that coRPysprofiling will provide some useful
functionality for exploratory analysis and visualization and help bridge
the gap between elementary text analysis to more sophisticated
approaches utilizing word embeddings.
See vignette here: https://ubc-mds.github.io/coRPysprofiling-R/articles/coRPysprofiling.html
The help file can be viewed by:
?coRPysprofiling::corpus_analysis
?coRPysprofiling::corpus_viz
?coRPysprofiling::corpora_compare
?coRPysprofiling::corpora_best_match
We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab.
Anita Li, Elanor Boyle-Stanley, Junghoo Kim, and Ivy Zhang
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.