plot_upset: Plot 'term_count' Object as Upset Plot

Description Usage Arguments Value Note References See Also Examples

View source: R/plot_upset.R

Description

Enables exploration of overlapping term categories which is useful for tasks such as improving discrimination (see also tag_co_occurrence). The upset plot is designed to allow for exploration of overlapping sets where Euler/Venn plots fail to scale. This function wraps the upset to allow for exploration of the degree to which categories from a term_count object overlap. This may help to collapse codes or to see how constructs are combined within the same text. The upset plot method is complex and requires careful study in order to lead to meaningful interpretations but the time invested can pay dividends in scalable insights.

Usage

1
plot_upset(x, text_funs = NULL, ...)

Arguments

x

A term_count object.

text_funs

Additional list of named functions (names will be used for naming the variables created) for creating additional columns in the data set that can be used to add attributes plots to the upset output. These columns must be added but then called via ... using upset syntax. Note that n.tags & n.words is computed automatically without the need to pass a function in directly here.

...

Other arguments passed to upset.

Value

Returns an uset plot.

Note

Because upset has many arguments termco has opted to use ... to pass the arguments to plot_upset as it makes plot_upset easier to maintain as upset makes changes to its API. This means the plot_upset isn't that useful for understanding how the function operates. Use ?UpSetR::upset for a full list of the parameters that can be passed to termco::plot_upset. For example, sets enables more/less terms to be viewed, order.by specifies how the intersections between categories is arranged (default is number of tags), and nintersects hones in on how many intersects (top bar plot) can be viewed at one time. The default is 25. mb.ratio controls the spacing given to the top and lower pane (2 element numeric vector). By default plot_upset attempts to auto scale this based on the number of tags being displayed.

References

Jake R Conway, J. R, Lex, A., & Gehlenborg, N. (2017), UpSetR: An R package for the visualization of intersecting sets and their properties doi:10.1093/bioinformatics/btx364

http://caleydo.org/tools/upset

See Also

upset tag_co_occurrence

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
require(dplyr)
require(UpSetR)

term_list <- list(
    `if` = c('if'),
    ans = c('an'),
    or = c('or'),
    buts = c('but')
)

out <- presidential_debates_2012 %>%
     with(term_count(dialogue, TRUE, term_list))

plot_upset(out)

## Not run: 
plot_upset(out, order.by = c("freq", "degree"))
plot_upset(out, order.by = "degree")
plot_upset(out, order.by = "degree", decreasing = FALSE)

## Adjust top pane/lower pane spacing
plot_upset(out, mb.ratio = c(0.45, 0.55))
plot_upset(out, mb.ratio = c(0.85, 0.15))

plot_upset(out,
    queries = list(
        list(query = intersects, params = list("or"), color = "orange", active = TRUE),
        list(query = intersects, params = list("if", 'ans'), color = "#0099CC", active = TRUE)
    )
)


## Attributes plotting with built in text var measures
plot_upset(out,
    queries = list(
        list(query = intersects, params = list("or"), color = "orange", active = TRUE),
        list(query = intersects, params = list("if", 'ans'), color = "#0099CC", active = TRUE)
    ),
    attribute.plots = list(
        gridrows = 45,
        plots = list(
            list(
                plot = scatter_plot,
                x = "n.words",
                y = "n.tags",
                queries = TRUE
            )
        ),
        ncols = 1
    ),
    query.legend = "bottom"
)

## Attributes plotting:
## Compute your own text var measures
plot_upset(
    out,
    text_funs = list(n.chars = function(x) nchar(x)),

    main.bar.color = "gray60",
    sets.bar.color = "gray60",
    matrix.color = 'grey60',

    queries = list(
        list(query = intersects, params = list("or"), color = "orange", active = TRUE),
        list(query = intersects, params = list("if", 'ans'), color = "#0099CC", active = TRUE)
    ),
    attribute.plots = list(
        gridrows = 45,
        plots = list(
            list(
                plot = scatter_plot,
                x = "n.words",
                y = "n.tags",
                queries = TRUE
            ),
            list(
                plot = scatter_plot,
                x = "n.words",
                y = "n.chars",
                queries = TRUE
            ),
            list(
                plot = histogram,
                x = "n.words",
                queries = TRUE
            )
        ),
        ncols = 3
    ),
    query.legend = "bottom"
)

## More examples of computing your own text var measures
plot_upset(
    out,
    text_funs = list(
        sentiment = function(z){sentimentr::sentiment_by(z)$ave_sentiment}
    ),
    queries = list(
        list(query = intersects, params = list("or"), color = "orange", active = TRUE),
        list(query = intersects, params = list("if", 'ans'), color = "#0099CC", active = TRUE),
        list(query = intersects, params = list("buts", 'ans'), color = "#32CD32", active = TRUE)
    ),
    attribute.plots = list(
        gridrows = 45,
        plots = list(
            list(
                plot = scatter_plot,
                y = "sentiment",
                x = "n.tags.unique",
                queries = TRUE
            ),
            list(
                plot = scatter_plot,
                y = "sentiment",
                x = "n.tags",
                queries = TRUE
            ),
            list(
                plot = histogram,
                x = "sentiment",
                queries = TRUE
            )
        ),
        ncols = 3
    ),
    query.legend = "bottom"
)

plot_upset(
    out,
    text_funs = list(
        sentiment = function(z){sentimentr::sentiment_by(z)$ave_sentiment}
    ),
    queries = list(
        list(query = intersects, params = list("or"), color = "orange", active = TRUE),
        list(query = intersects, params = list("if", 'ans'), color = "#0099CC", active = TRUE),
        list(query = intersects, params = list("buts", 'ans'), color = "#32CD32", active = TRUE)
    ),
    boxplot.summary = c("sentiment")
)

## Demonstration of the auto scaling of the plot region
regs2 <- as_term_list(frequent_terms(presidential_debates_2012[["dialogue"]])[[1]])

model2 <- with(presidential_debates_2012,
    term_count(dialogue, TRUE, regs2)
)

plot_upset(model2)

regs3 <- as_term_list(frequent_terms(presidential_debates_2012[["dialogue"]], 60)[[1]])

model3 <- with(presidential_debates_2012,
    term_count(dialogue, TRUE, regs3)
)

plot_upset(model3)
plot_upset(model3, order.by = c("freq", "degree"), nintersects = 80)

## End(Not run)

trinker/termco documentation built on Jan. 7, 2022, 3:32 a.m.