knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The premise of phrasenets is very simple: to help you build "phrase nets." These are very easy, in fact, if you know your way around you probably won't need this package at all.

What are "phrase nets"

It's almost too simple, it connects words according to "connectors" of your choosing, e.g.:

The default connectors are to, in, at, and, and of.

Examples

The phrase_net function simply returns a lsit of edges and the number of times these were found in the body of text.

library(phrasenets)

data(reuters)

phrase_net(reuters, text = text) %>% 
  head() %>% 
  knitr::kable()

The package also comes with a convenience function filter_net to filter out any edges that contains specific words. Below we remove edges with the and a and use the plot_sigmajs to easily plot the network using sigmajs.

reuters %>% 
  phrase_net(text = text) %>% 
  filter_net(c("a", "the")) %>% 
  dplyr::filter(occurences > 5) %>% 
  plot_sigmajs()

The dataset provided (reuters) contains articles on 10 difference commodities, let's plot their respective phrasenets.

library(dplyr)
library(purrr)
library(ggraph)
library(tidygraph)

# create a graph for each commodity
subgraphs <- reuters %>% 
  group_split(category) %>% 
  map(phrase_net, text = text) %>% 
  map(filter_net, c("a", "the")) %>% 
  map(filter, occurences > 1) %>% 
  map(as_tbl_graph) %>% 
  map(function(x){
    mutate(x, size = centrality_degree())
  })

plot_it <- function(g, commodity){
  ggraph(g, layout = 'kk') + 
    geom_edge_fan(show.legend = FALSE) + 
    geom_node_point(aes(size = size, colour = size), show.legend = FALSE) + 
    theme_graph(
      background = "#f9f7f1"
    ) +
    labs(caption = tools::toTitleCase(commodity))
} 

commodities <- unique(reuters$category)

map2(subgraphs, commodities, plot_it) %>% 
  patchwork::wrap_plots(ncol = 2) &
  theme(
    panel.background = element_rect(fill = "#f9f7f1"),
    plot.background = element_rect(fill = "#f9f7f1")
  )


news-r/phrasenets documentation built on Nov. 4, 2019, 9:40 p.m.