Home

/

GitHub

/

README.md
In UBC-MDS/coRPysprofiling-R: R Package for EDA and EDV on text

coRPysprofiling

coRPysprofiling is an open-source library designed to bring exploratory data analysis and visualization to the domain of natural language processing. Functions in the package will be used to provide some elementary statistics and visualizations for a single text corpus or provide functions to compare multiple corpora with each other.

# install.packages("devtools")
devtools::install_github("UBC-MDS/coRPysprofiling-R")

Some specific functions include:

corpus_analysis: corpus analysis will generate a statistical report about the characteristics of a single corpus (e.g. unique word count, average word/sentence length, top words used, topic analysis).
corpus_viz: corpus_viz will generate relevant visualizations of a single corpus (e.g. word cloud, histograms for average word/sentence length, top words used).
corpora_compare: Given two or more corpora, corpora_compare will find similarity (e.g, Euclidean distance or cosine similarity) between each pair of corpora.
corpora_best_match: Given a reference document and two or more corpora, corpora_best_match will rank the corpora in the order of most relevance to the reference document.

To our knowledge, while wordcloud library generates wordcloud visualization for a given corpus, there is no general-purpose library for exploratory analysis and visualization of a text corpus in the R ecosystem. There are several advanced libraries for comparing similarities between different corpora: most notably, quanteda provides similarity comparison between large corpora using word embeddings. We believe that coRPysprofiling will provide some useful functionality for exploratory analysis and visualization and help bridge the gap between elementary text analysis to more sophisticated approaches utilizing word embeddings.

dplyr
ggplot2
ggwordcloud
stringr
stringi
here
stopwords
tokenizers
word2vec

See vignette here: https://ubc-mds.github.io/coRPysprofiling-R/articles/coRPysprofiling.html

The help file can be viewed by:

?coRPysprofiling::corpus_analysis
?coRPysprofiling::corpus_viz
?coRPysprofiling::corpora_compare
?coRPysprofiling::corpora_best_match

We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab.

Development Team

Anita Li, Elanor Boyle-Stanley, Junghoo Kim, and Ivy Zhang

UBC-MDS/coRPysprofiling-R documentation built on March 30, 2021, 12:02 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

UBC-MDS/coRPysprofiling-R
R Package for EDA and EDV on text

README.md
In UBC-MDS/coRPysprofiling-R: R Package for EDA and EDV on text

coRPysprofiling

Summary

Installation

Features

Relevance to the R Ecosystem

Dependencies

Usage

Documentation

Contributors

Development Team

R Package Documentation

Browse R Packages

We want your feedback!

UBC-MDS/coRPysprofiling-R R Package for EDA and EDV on text

README.md In UBC-MDS/coRPysprofiling-R: R Package for EDA and EDV on text

coRPysprofiling

Summary

Installation

Features

Relevance to the R Ecosystem

Dependencies

Usage

Documentation

Contributors

Development Team

R Package Documentation

Browse R Packages

We want your feedback!

UBC-MDS/coRPysprofiling-R
R Package for EDA and EDV on text

README.md
In UBC-MDS/coRPysprofiling-R: R Package for EDA and EDV on text