BibToCorpus: Convert a bibliographic database into a text corpus
In KDViz: Knowledge Domain Visualization

Description Usage Arguments Details Value Author(s) See Also Examples

Get a text corpus from a bibliographic database with a control list and help options that allow you to run a faster process of composition of corpus.

1 2	BibToCorpus(bibData, bibUnits = "Keywords", controlList, stopWords = TRUE, wordsToRemove, replaceWords)

`bibData`	a dataframe containing information about a bibliographic database.
`bibUnits`	a string, the bibliographic unit to be analyzed e.g. "Title", "Keywords", "Abstract". This string must match the column name from the "bibData" dataframe.
`controlList`	a vector indicating the transformations and processes that will be performed during the corpus composition process. Available options: `stripWhitespace` for collapsing white spaces; `removeNumbers` for removing numbers inside texts in corpus;
`stopWords`	logical. If `TRUE`, a list of stop words will be removed from the composed corpus.
`wordsToRemove`	a vector of words that are desired to be removed from the composed corpus.
`replaceWords`	a `TXT` file (two columns separated by tab). One column containing the final word to be in the corpus and a second, containing the word to replace. Example: clustering cluster_analysis clustering cluster

A list of stop words is provided inside the package for English language, if necessary, please visit https://sites.google.com/site/kevinbouge/stopwords-lists for a complete list of stop words in many other language, available thanks to Kevin Bouge (kevin.bouge@gmail.com)

An object inheriting from VCorpus and Corpus.

Andres Palacios anfpalacioscl@unal.edu.co

ArticleSearch can be useful for creating a bibliographic information dataframe if starting from scratch.

data("KDVizData")
wordsToReplace <- system.file("extdata", "KDReplaceWords.txt", package = "KDViz")
wordsToRemove <- c("analysis", "data", "text", "review", "topic", "theory", "system", "protein")

myCorpus <- BibToCorpus(bibData = KDVizData, bibUnits = "Keywords",
  controlList = c("stripWhitespace", "removeNumbers"), stopWords = TRUE,
  wordsToRemove = wordsToRemove, replaceWords = wordsToReplace)

Processing Corpus from bibliometric data...

Collapsing multiple whitespace characters to a single one...
Removing stop words...
Removing words from custom list...
Removing numbers...
24 words to replace:
 4.2% of words replaced
 8.3% of words replaced
 12.5% of words replaced
 16.7% of words replaced
 20.8% of words replaced
 25% of words replaced
 29.2% of words replaced
 33.3% of words replaced
 37.5% of words replaced
 41.7% of words replaced
 45.8% of words replaced
 50% of words replaced
 54.2% of words replaced
 58.3% of words replaced
 62.5% of words replaced
 66.7% of words replaced
 70.8% of words replaced
 75% of words replaced
 79.2% of words replaced
 83.3% of words replaced
 87.5% of words replaced
 91.7% of words replaced
 95.8% of words replaced
 100% of words replaced
Corpus process finished