Description Usage Arguments Details Value Author(s) See Also Examples
Get a text corpus from a bibliographic database with a control list and help options that allow you to run a faster process of composition of corpus.
1 2 | BibToCorpus(bibData, bibUnits = "Keywords", controlList, stopWords = TRUE,
wordsToRemove, replaceWords)
|
bibData |
a dataframe containing information about a bibliographic database. |
bibUnits |
a string, the bibliographic unit to be analyzed e.g. "Title", "Keywords", "Abstract". This string must match the column name from the "bibData" dataframe. |
controlList |
a vector indicating the transformations and processes that will be performed during the corpus composition process. Available options: |
stopWords |
logical. If |
wordsToRemove |
a vector of words that are desired to be removed from the composed corpus. |
replaceWords |
a |
A list of stop words is provided inside the package for English language, if necessary, please visit https://sites.google.com/site/kevinbouge/stopwords-lists for a complete list of stop words in many other language, available thanks to Kevin Bouge (kevin.bouge@gmail.com)
An object inheriting from VCorpus
and Corpus
.
Andres Palacios anfpalacioscl@unal.edu.co
ArticleSearch
can be useful for creating a bibliographic information dataframe if starting from scratch.
1 2 3 4 5 6 7 | data("KDVizData")
wordsToReplace <- system.file("extdata", "KDReplaceWords.txt", package = "KDViz")
wordsToRemove <- c("analysis", "data", "text", "review", "topic", "theory", "system", "protein")
myCorpus <- BibToCorpus(bibData = KDVizData, bibUnits = "Keywords",
controlList = c("stripWhitespace", "removeNumbers"), stopWords = TRUE,
wordsToRemove = wordsToRemove, replaceWords = wordsToReplace)
|
Processing Corpus from bibliometric data...
Collapsing multiple whitespace characters to a single one...
Removing stop words...
Removing words from custom list...
Removing numbers...
24 words to replace:
4.2% of words replaced
8.3% of words replaced
12.5% of words replaced
16.7% of words replaced
20.8% of words replaced
25% of words replaced
29.2% of words replaced
33.3% of words replaced
37.5% of words replaced
41.7% of words replaced
45.8% of words replaced
50% of words replaced
54.2% of words replaced
58.3% of words replaced
62.5% of words replaced
66.7% of words replaced
70.8% of words replaced
75% of words replaced
79.2% of words replaced
83.3% of words replaced
87.5% of words replaced
91.7% of words replaced
95.8% of words replaced
100% of words replaced
Corpus process finished
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.