textplot_correlation_lines: Document/Term Correlation Plot

View source: R/textplot_corlines.R

textplot_correlation_linesR Documentation

Document/Term Correlation Plot

Description

Plots the highest occurring correlations among terms.
This is done by plotting the terms into nodes and the correlations between the terms as lines between the nodes. Lines of the edges are proportional to the correlation height. This uses the plot function for graphNEL objects (using the Rgraphviz package)

Usage

textplot_correlation_lines(x, ...)

## Default S3 method:
textplot_correlation_lines(
  x,
  terms = colnames(x),
  threshold = 0.05,
  top_n,
  attrs = textplot_correlation_lines_attrs(),
  terms_highlight,
  label = FALSE,
  cex.label = 1,
  col.highlight = "red",
  lwd = 1,
  ...
)

Arguments

x

a document-term matrix of class dgCMatrix

...

other arguments passed on to plot

terms

a character vector with terms present in the columns of x indicating terms to focus on

threshold

a threshold to show only correlations between the terms with absolute values above this threshold. Defaults to 0.05.

top_n

an integer indicating to show only the top top_n correlations. This can be set to plot only the top correlations. E.g. set it to 20 to show only the top 20 correlations with the highest absolute value.

attrs

a list of attributes with graph visualisation elements passed on to the plot function of an object of class graphNEL. Defaults to textplot_correlation_lines_attrs.

terms_highlight

a vector of character terms to highlight or a vector of numeric values in the 0-1 range indicating how much (in percentage) to increase the node font size. See the examples.

label

logical indicating to draw the label with the correlation size between the nodes

cex.label

cex of the label of the correlation size

col.highlight

color to use for highlighted terms specified in terms_highlight. Defaults to red.

lwd

numeric value - graphical parameter used to increase the edge thickness which indicates the correlation strength. Defaults to 1.

Value

invisibly the plot

Examples


## Construct document/frequency/matrix
library(graph)
library(Rgraphviz)
library(udpipe)
data(brussels_reviews_anno, package = 'udpipe')
exclude <- c(32337682L, 27210436L, 26820445L, 37658826L, 33661134L, 48756422L,
  23454554L, 30461127L, 23292176L, 32850277L, 30566303L, 21595142L,
  20441279L, 38097066L, 28651065L, 29011387L, 37316020L, 22135291L,
  40169379L, 38627667L, 29470172L, 24071827L, 40478869L, 36825304L,
  21597085L, 21427658L, 7890178L, 32322472L, 39874379L, 32581310L,
  43865675L, 31586937L, 32454912L, 34861703L, 31403168L, 35997324L,
  29002317L, 33546304L, 47677695L)
dtm <- brussels_reviews_anno
dtm <- subset(dtm, !doc_id %in% exclude)
dtm <- subset(dtm, xpos %in% c("NN") & language == "nl" & !is.na(lemma))
dtm <- document_term_frequencies(dtm, document = "doc_id", term = "lemma")
dtm <- document_term_matrix(dtm)
dtm <- dtm_remove_lowfreq(dtm, minfreq = 5)
dtm <- dtm_remove_tfidf(dtm, top = 500)

## Plot top 20 correlations, having at least a correlation of 0.01
textplot_correlation_lines(dtm, top_n = 25, threshold = 0.01)

## Plot top 20 correlations
textplot_correlation_lines(dtm, top_n = 25, label = TRUE, lwd = 5)

## Plot top 20 correlations and highlight some terms
textplot_correlation_lines(dtm, top_n = 25, label = TRUE, lwd = 5,
                           terms_highlight = c("prijs", "privacy"),
                           main = "Top correlations in topic xyz")

## Plot top 20 correlations and highlight + increase some terms
textplot_correlation_lines(dtm, top_n = 25, label = TRUE, lwd=5,
                           terms_highlight = c(prijs = 0.8, privacy = 0.1),
                           col.highlight = "red")

## Plot correlations between specific terms
w <- dtm_colsums(dtm)
w <- head(sort(w, decreasing = TRUE), 100)
textplot_correlation_lines(dtm, terms = names(w), top_n = 20, label = TRUE)

attrs <- textplot_correlation_lines_attrs()
attrs$node$shape <- "rectangle"
attrs$edge$color <- "steelblue"
textplot_correlation_lines(dtm, top_n = 20, label = TRUE,
                           attrs = attrs)



textplot documentation built on July 18, 2022, 1:05 a.m.