Long-form tables often contain repeated elements. Using some new approaches for non-standard evaluation and tidy data techniques makes is easy to "separate and normalize" these tables. Some examples.
The in-built quakes data set in R has a small number observations at the same location.
sum(duplicated(quakes[, c("long", "lat")])) library(dplyr) library(nf) nf(quakes, -long, -lat)
The function nf
allows the user to specify which attributes (columns) to treat as data
, normalizing that table to unique cases and recording an index from the original struture.
Why is this useful?
Say we have a series of animal tracks, with longitude, latitude, date-time, id of the animal and other observations of the animal (rather than its track).
tr <- data_frame(id = c(rep(1, 10), rep(2, 15)), long = 1:25, lat = rnorm(25) + id * 3, trip = c(rep(1, 10), rep(2, 6), rep(3, 9)), date = Sys.time() + seq(25) + sample(5, 25, replace = TRUE)) library(ggplot2) ggplot(tr) + aes(x = long, y = lat, group = trip, col = factor(id)) + geom_line() + aes(x = long, y = lat, label = format(date, "%M:%S")) + geom_text()
We know these data are grouped in a hierarchical way, not only is there a grouping based on id
there's also a trip for a given id, and we can have any number of vertex attributes - for spatial locations like longitude, latitude, time, height, depth, temperature.
The nf
function allows us to specify which attributes are part of the geometry, and which of the topology, although in this case there's still a one-to-one.
nf(tr, -long, -lat, -date) ggplot(tr) + aes(x = long, y = lat, group = trip, col = factor(trip)) + geom_line()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.