# library(rooc) # source("../R/Doc.R") # source("../R/Globals.R") pulse <- read_pulse()
Change layout of values in a table
Source: data import cheat sheet : tidyr
Two main functions to manipulate the layout of the table, pivot_longer
transforms the table from wide to long format and pivot_wider
, which does the opposite, converts the table from long to wide format.
pivot_longer
: wide to long formatThis function allows collapsing 'similar' variables into one variable while guaranteeing the data set's consistency. For example take the variables pulse1
and pulse2
in the following subset of the pulse data set:
pulses <- pulse %>% select(name,pulse1,pulse2) %>% head(3) pulses
We can transform the table as such that all pulse values are under a single variable, say pulse
:
dfLong <- pulses %>% pivot_longer(c(pulse1, pulse2), names_to = "pulse", values_to = "level") dfLong
This is called the long version of the original (wide) table and contains the same information.
Alternatively you achieve the same result using !
(negation operator):
dfLong <- pulses %>% pivot_longer(!name, names_to = "pulse", values_to = "level") dfLong
Can you find ohter variables in the pulse data set which can be transformed to long format?
msg <- "Yes, for example smokes and alcohol, both are categorical with values {yes,no}." qa(msg)
Let's reshape the pulse dataset with variables drug
= {smokes,alcohol} and use
= {yes,no}.
pulse %>% pivot_longer(c(smokes,alcohol), names_to = "drug", values_to = "use")
pivot_wider
: long to wide formatdfWide <- dfLong %>% pivot_wider(names_from = "pulse", values_from = "level") dfWide
Below, pulses tibble has been transformed into pulses2. What can you say about this transformation?
pulses2 <- pulses %>% pivot_longer(!name, names_to = "pulse", values_to = "level") %>% pivot_wider(names_from = "pulse", values_from = "level")
msg <- "pulses and pulses2 are identical" qa(msg)
Often you will need to reshape your data for ggplot visualisations. For example we would like to compare pulse1
and pulse2
variables with a boxplot
. You may plot pulse1 and pulse2 in separate plots one after the other:
require(gridExtra) # Make sure gridExtra is installed with install.packages("gridExtra"). bp1 <- ggplot(pulse %>% drop_na()) + # remove missing aes(y=pulse1) + geom_boxplot() bp2 <- ggplot(pulse %>% drop_na()) + # remove missing aes(y=pulse2) + geom_boxplot() grid.arrange(bp1, bp2, ncol=2)
As you can see the y-scale of each plot is set dynamically and can be misleading. To resolve this, we would need a single plot with aesthetic mapping of x
being the categorical variable pulse
and y
the pulse values as was shown above.
ggplot(pulse %>% drop_na() %>% # remove missing pivot_longer( c(pulse1,pulse2), names_to = "pulse", values_to = "level") ) + aes(x=pulse, y=level) + geom_boxplot()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.