knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This tutorial gives an example of how to use akc
package to carry out automatic knowledge classification based on raw text. First, load the packages we need.
library(akc) library(dplyr)
In the dataset, we have the ID, title, keyword and abstract of documents. We are going to use the keyword as the dictionary to extract keywords from the abstract.
bibli_data_table
keyword_clean
is designed to split the keywords and removed pure numbers and contents in the parentheses. All letters would be converted to lower case. Details see the help of keyword_clean
, use "?keyword_clean".
After cleaning, we'll use these keywords to establish a dictionary.
bibli_data_table %>% keyword_clean() %>% pull(keyword) %>% make_dict() -> my_dict
Using keyword_extract
to extract keywords from the abstract. Here, we also exclude the stop words using the "stopword" parameter.
# get stop words from `tidytext` package tidytext::stop_words %>% pull(word) %>% unique() -> my_stopword bibli_data_table %>% keyword_extract(id = "id",text = "abstract", dict = my_dict,stopword = my_stopword) -> extracted_keywords
While this process has consider lots of factors, such as stemming, lemmatizing, etc. Here I'll provide a easy implementation. For advanced usage, use "?keyword_merge" to find out.
extracted_keywords %>% keyword_merge() -> merged_keywords
This process will construct a keyword co-occurrence network and use community detection to group the keywords automatically. You can use "top" or "min_freq" to control how many keywords should be included in the network. "top" means how many keywords with largest frequency should be included. "min_freq" means the included keywords should emerge at least how many times. Default uses top = 200
and min_freq = 1
.
merged_keywords %>% keyword_group() -> grouped_keywords
Getting the result as a table could be easy by:
grouped_keywords %>% as_tibble()
If you only wants the top keywords to be displayed, keyword_table
provides another relatively formal table:
grouped_keywords %>% keyword_table()
In such implementation, only two groups are found. You can specify the number of top keywords using "top" parameter.
Currently, keyword_vis
,keyword_network
and keyword_cloud
could all be used to draw plots for the network, but in differnt forms. Let's try to draw a word cloud first:
grouped_keywords %>% keyword_cloud()
To get the word cloud of one group,use:
grouped_keywords %>% keyword_cloud(group_no = 1)
If you want to draw a network, use keyword_network
:
grouped_keywords %>% keyword_network()
In the plot, "N=106" means altogether there are 106 keywords in the group, though only the top 10 by frequency are showed in the graph. If you only want to visualize the second group and display 20 nodes, try:
grouped_keywords %>% keyword_network(group_no = 2,max_nodes = 20)
Have fun playing with akc!
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.