View source: R/keyword_clean.R
keyword_clean | R Documentation |
Carry out several keyword cleaning processes automatically and return a tidy table with document ID and keywords.
keyword_clean( df, id = "id", keyword = "keyword", sep = ";", rmParentheses = TRUE, rmNumber = TRUE, lemmatize = FALSE, lemmatize_dict = NULL )
df |
A data.frame containing at least two columns with document ID and keyword strings with separators. |
id |
Quoted characters specifying the column name of document ID.Default uses "id". |
keyword |
Quoted characters specifying the column name of keywords.Default uses "keyword". |
sep |
Separator(s) of keywords. Default uses ";". |
rmParentheses |
Remove the contents in the parentheses (including the parentheses) or not. Default uses TRUE. |
rmNumber |
Remove the pure number sequence or no. Default uses TRUE. |
lemmatize |
Lemmatize the keywords or not. Lemmatization is supported by 'lemmatize_strings' function in 'textstem' package.Default uses FALSE. |
lemmatize_dict |
A dictionary of base terms and lemmas to use for replacement.
Only used when the lemmatize parameter is |
The entire cleaning processes include:
1.Split the text with separators;
2.Remove the contents in the parentheses (including the parentheses);
3.Remove white spaces from start and end of string and reduces repeated white spaces inside a string;
4.Remove all the null character string and pure number sequences;
5.Convert all letters to lower case;
6.Lemmatization.
Some of the procedures could be suppressed or activated with parameter adjustments.
Default setting did not use lemmatization, it is suggested to use keyword_merge
to
merge the keywords afterward.
A tbl with two columns, namely document ID and cleaned keywords.
keyword_merge
library(akc) bibli_data_table bibli_data_table %>% keyword_clean(id = "id",keyword = "keyword")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.