knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
suppressPackageStartupMessages({
  library(ggplot2)
  library(lobstr)
  library(dplyr)
  library(purrr)
  library(exhibitionist)
  library(lofi)
})

Introduction

In this vignette, lofi is used to pack each row of the iris data into an integer.

Steps:

  1. Create a pack_spec for one row
  2. pack()/unpack() a single row to test if it works
  3. Use purrr::map() to apply the packing to every row.

Create a pack spec

The iris dataset gives the measurements in cm of the variables sepal length and width, and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The first rows of the data are shown below:

knitr::kable(head(iris, 3), caption = "First rows of iris data")

The pack_spec for the data seen in iris is:

The defined pack_spec is stored as a list:

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Can perfectly pack 'iris' into 27 bits per row.
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pack_spec <- list(
  Sepal.Length = list(type = 'integer', nbits = 7, mult = 10, signed = FALSE),
  Sepal.Width  = list(type = 'integer', nbits = 6, mult = 10, signed = FALSE),
  Petal.Length = list(type = 'integer', nbits = 7, mult = 10, signed = FALSE),
  Petal.Width  = list(type = 'integer', nbits = 5, mult = 10, signed = FALSE),
  Species      = list(type = 'choice' , nbits = 2,
                      options = c('setosa', 'versicolor', 'virginica'))
)

Pack/unpack a single row

Now take the first row of iris and pack() it:

lofi::pack(iris[1, ], pack_spec)
bits <- as.integer(lofi:::int32_to_bits(54052616L))

chars_df <- tibble(
  char = bits,
  x    = seq_along(char)
)

annotation_df <- tribble(
  ~start, ~end, ~text, ~segment, ~label            , ~segment_colour, ~segment_size, ~text_y,
       1,    5,  TRUE,     TRUE, "unused"      , 'blue'            ,             2,   -0.75,
       6,   12,  TRUE,     TRUE, "Sepal.Length", 'darkred'         ,             2,   -0.75,
      13,   18,  TRUE,     TRUE, "Sepal.Width ", 'blue'            ,             2,   -0.75,
      19,   25,  TRUE,     TRUE, "Petal.Length", 'darkred'         ,             2,   -0.75,
      26,   30,  TRUE,     TRUE, "Petal.Width ", 'blue'            ,             2,   -0.75,
      31,   32,  TRUE,     TRUE, "Species"     , 'darkred'         ,             2,   -0.75
)


png("../man/figures/iris-bits.png", width = 800, height = 100)
  plot_chars(chars_df, annotation_df, base_size = 5) + 
    ggplot2::ylim(-2, 1.5)
dev.off()

So the first row of iris has now been packed into the integer: 54052616.
If this integer is viewed as the 32 bits which make it up, the different lofi data representations can be identified:

If the integer is now unpack()ed, we get back the original data.

lofi::unpack(54052616L, pack_spec)

pack/unpack every row

pack/unpack may be mapped over the rows of a data.frame to encode every row as a single integer value.

In the following example, each row of the iris data is encoded as a single 32-bit integer value.

The packed lofi representation of iris is ~12x smaller than the original data.frame.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Pack the entire data.frame one row at a time using 'transpose' + 'map'
# `lofi` does not handle factors, so convert 'Species' explicitly to a character
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
iris_packed <- iris %>%
  mutate(Species = as.character(Species)) %>% 
  transpose() %>%
  map_int(pack, pack_spec)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# 'iris' is now encoded as a vector of ints
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
head(iris_packed, 21)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Packed representation is smaller by a factor of 10
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
as.numeric(lobstr::obj_size(iris) / lobstr::obj_size(iris_packed)) 

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# And can unpack the integers into the original data.frame representation
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
iris_packed %>%
  map(unpack, pack_spec) %>%
  bind_rows() %>%
  head()


coolbutuseless/lofi documentation built on Nov. 4, 2019, 9:13 a.m.