README.md

coRPysprofiling

R-CMD-check codecov

Summary

coRPysprofiling is an open-source library designed to bring exploratory data analysis and visualization to the domain of natural language processing. Functions in the package will be used to provide some elementary statistics and visualizations for a single text corpus or provide functions to compare multiple corpora with each other.

Installation

# install.packages("devtools")
devtools::install_github("UBC-MDS/coRPysprofiling-R")

Features

Some specific functions include:

Relevance to the R Ecosystem

To our knowledge, while wordcloud library generates wordcloud visualization for a given corpus, there is no general-purpose library for exploratory analysis and visualization of a text corpus in the R ecosystem. There are several advanced libraries for comparing similarities between different corpora: most notably, quanteda provides similarity comparison between large corpora using word embeddings. We believe that coRPysprofiling will provide some useful functionality for exploratory analysis and visualization and help bridge the gap between elementary text analysis to more sophisticated approaches utilizing word embeddings.

Dependencies

Usage

See vignette here: https://ubc-mds.github.io/coRPysprofiling-R/articles/coRPysprofiling.html

Documentation

The help file can be viewed by:

?coRPysprofiling::corpus_analysis
?coRPysprofiling::corpus_viz
?coRPysprofiling::corpora_compare
?coRPysprofiling::corpora_best_match

Contributors

We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab.

Development Team

Anita Li, Elanor Boyle-Stanley, Junghoo Kim, and Ivy Zhang



UBC-MDS/coRPysprofiling-R documentation built on March 30, 2021, 12:02 p.m.