Logical queries

knitr::opts_chunk$set(
    echo = TRUE, 
    tidy.opts = list(width.cutoff = 65),
    tidy = FALSE)

set.seed(12314159)

imageDirectory <- "./images/logic"
dataDirectory <- "data"

One of the principal strengths of linked plots is the ease with which one can form complex logical queries on the data.

library(loon)

The cars of the 1974 Motor Trends magazine

Begin with a classic data set in R -- mtcars.
For the sake of illustration, some enrichment of the variables and their values will be made:

data(mtcars, package = "datasets")

mtcars$country <- c("Japan", "Japan", "Japan", "USA", "USA", "USA", "USA", 
                    "Germany", "Germany", "Germany", "Germany", "Germany", 
                    "Germany", "Germany", "USA", "USA", "USA", "Italy", 
                    "Japan", "Japan", "Japan", "USA", "USA", "USA", "USA", 
                    "Italy", "Germany", "UK", "USA", "Italy", "italy", "Sweden")
mtcars$continent <- c("Asia", "Asia", "Asia", "North America", "North America", 
                      "North America", "North America", "Europe", "Europe", 
                      "Europe", "Europe", "Europe", "Europe", "Europe",  
                      "North America", "North America", "North America", 
                      "Europe", "Asia", "Asia", "Asia", "North America", 
                      "North America", "North America", "North America", 
                      "Europe", "Europe", "Europe", "North America", 
                      "Europe", "Europe", "Europe" )
mtcars$company <- c("Mazda", "Mazda", "Nissan", "AMC", "AMC", "Chrysler", 
                    "Chrysler", "Mercedes", "Mercedes", "Mercedes", "Mercedes",
                    "Mercedes", "Mercedes", "Mercedes", "GM", "Ford", 
                    "Chrysler", "Fiat", "Honda", "Toyota", "Toyota", "Chrysler", 
                    "AMC", "GM", "GM", "Fiat", "Porsche", "Lotus", "Ford", 
                    "Ferrari", "Maserati", "Volvo")

mtcars$Engine <- factor(c("V-shaped", "Straight")[mtcars$vs +1], 
                        levels = c("V-shaped", "Straight"))
mtcars$Transmission <- factor(c("automatic", "manual")[mtcars$am +1], 
                              levels = c("automatic", "manual"))

mtcars$vs <- NULL  # These are redundant now
mtcars$am <- NULL  # 

For this illustration, it will be convenient to separate categorical from continuous data.

varTypes <- split(names(mtcars), 
                  sapply(mtcars, 
                         FUN = function(x){
                             if(is.factor(x)|is.character(x)){ 
                                 "categorical"
                                  } else {"numeric"} } ))

varTypes is a list with two named components: categorical and numeric.

Some interactive plots

To explore the data, several interactive plots will likely have been constructed. Typically, these will have been constructed one at a time and assigned to the same linking group (perhaps via the inspector).

Below, histograms/barplots are constructed for each categorical variable and assigned to that variable name now prefixed by h_ for "histogram".

for (varName in varTypes$categorical) {
    assign(paste0("h_", varName),
           l_hist(mtcars[ , varName], showFactors = TRUE,  
                  xlabel = varName, title = varName, 
                  linkingGroup = "Motor Trend"))
}

These are not evaluated in this vignette. Note that all are in the same linkingGroup.

Other linked plots might exist as well -- for example, a scatterplot of gear (the number of forward gears) versus disp (the engine displacement in cubic inches).

p <- with(mtcars, l_plot(disp, cyl, 
                         xlabel = "engine displacement", ylabel = "number of cylinders",
                         title = "1974 Motor Trend cars", 
                         linkingGroup = "Motor Trend",
                         size = 10, showScales = TRUE,
                         itemLabel = rownames(mtcars), showItemLabels = TRUE
                         ))

Note that - each car's name appears as the itemLabel for that point in the plot (to be revealed as a "tooltip" style pop up), and that - the plot p is in the same linking group as the histograms.

Through a combination of selection, inversion, deactivation, and reactivation, logical queries may be made interactively on the data.

For simplicity, the basic logical operators are illustrated below using only the histograms. More generally, these apply to any interactive loon graphic.

Interactive logical operations

Five logical conditions/operations illustrated here are the basic ones:

  1. A is TRUE
  2. Negation: (NOT A) is TRUE
  3. Inclusive OR: (A OR B) is TRUE (one or the other or both),
  4. Conjunction: (A AND B) are both TRUE
  5. Exclusive OR: (A XOR B) meaning (A is TRUE) or (B is TRUE) but (A AND B) is FALSE

Each of these corresponds to a sequence of actions on the plots and/or inspector. Whatever is highlighted in the end corresponds to the result.

Again, for simplicity all operations are illustrated by interacting with values of categorical variates in the various histograms. Any of the logical elements could also have been that satisfying numerical constraints by undertaking the corresponding actions on a scatterplot (or histogram of continuous values).

Each logical operator is illustrated in turn:

  1. A ($= A$)

on the plot select A,

on a plot select A,

from the inspector click invert

on a plot select A,

on the same (or a different but linked) plot <SHIFT>- select B

lots of solutions, here is one that always works

on a plot select A,

from the inspector, invert then deactivate (only A remains),

from a plot of the remaining select B,

from the inspector reactivate all

following steps in 4, select A AND B,

from the inspector invert then deactivate (only $\neg({A \land B})$ remains)

following steps in 3, select A OR B,

from the inspector reactivate (only A XOR B is highlighted)

Other logical conditions (including numerical ones such as disp > 300 on the scatterplot p) are constructed as a combination of the above (as in exclusive or).

These can be quite complex and it may help, after some number of steps, to mark intermediary results by colour (or also glyph in scatterplots).

Note that because of possibly missing data, not all linked plots may share the same set of observations.

Missing data and linking keys

The mtcars data is an example of a complete data set. Had there been missing values, then these would not appear in loon plots that require them.

For example, suppose data has four variables A, B, C, and D, and

data <- data.frame(A = sample(c(rnorm(10), NA), 10, replace = FALSE),
                   B = sample(c(rnorm(10), NA), 10, replace = FALSE),
                   C = sample(c("firebrick", "steelblue", NA), 10, replace = TRUE),
                   D = sample(c(1:10, NA), 10, replace = FALSE))
p_test <- l_plot(x = data$A, y = data$B, color = data$C, linkingGroup = "test missing")
h_test <- l_hist(x = data$D, color = data$C, linkingGroup = "test missing")

Then

Note that it is generally not a good idea to use C for any simple display characteristic like color if indeed C has missing values since this will remove non-missing x and y values from the plot. Not all values of x and y would then be accessible from the plot for logical queries,

Using logical operations on the original data to change plot properties (e.g. select values) can be challenging when data values are missing in the plot (since what is missing depends on what was missing at the time of its construction).

For example,

p_test["selected"] <- (data$A > 0)

may not work!

There are two general approaches to logical queries when data contains NAs.

  1. Using complete data

If, like mtcars, the data being used contains no NAs then conducting logical queries on the plot will be identical to conducting them on the data.

If the data is not complete (contains one or more NA), it can be made complete by removing all observations (rows) that contain an NA. E.g. replacing data by c_data <- na.omit(data).

Logical queries can then be made

a. directly on the plots, either

  - interactively as described in the previous sections, or, 
  - programmatically as in `p_test["x"] > 0` in place of `data$A > 0`.

  or

b. directly on the data and applied to the plots

  To help manage this, the `linkingKey` *of each plot* can be used.

  - the default value for each plot is a character vector with entries

    from `"0"` to `"n-1"` where `n = `nrow(data)`.

    These are easily turned into the row numbers for the original data.

    E.g. in `p_test` the row numbers of `data` that correspond to the points is

    `1 + as.numeric(p_test["linkingKey"])`

    Logical values for the rows of `data` can then select points in `p` as follows

    ```r
    LogVal <- data$A > data$B
    p["selected"] <- logVal[1 + as.numeric(p_test["linkingKey"])]
    ```

    Similarly for `h_test`.  E.g., compare `p_test["linkingKey"]` and `h_test["linkingKey]"`.

 - **Note**: the user can always provide their own character vector `linkingKey` for their plots.

   - E.g., `linkingKey = rownames(data)`

   If so, then more care may be needed to use these to identify rows in a logical vector.

loon's linking model

Loon's linking model has the following three parts

Observations in different plots (in the same linking group) are linked (in that their linked states change together) if and only if they have the same linking key.

Points appearing in different plots (in the same linkingGroup) which matched on the value of their linkingKey will share the same value for their linked states.



Try the loon package in your browser

Any scripts or data that you put into this service are public.

loon documentation built on June 14, 2021, 9:07 a.m.