library(tidyverse) library(stringr) library(knitr) library(tmasc) opts_chunk$set(warning = F) theme_set(theme_bw() + theme(panel.grid = element_blank()))
erowid
is a data set of written reports of psychedelic experiences, from Erowid.org's Experience Vaults.
"The Erowid Experience Vaults are an attempt to catalog the wide variety of experiences people have with psychoactive plants and chemicals as well as experiences with endogenous (non-drug) mystical experiences, drug testing, police interactions, deep experiences of connection to music, etc." (Erowid Experience Vault)
The erowid
data set in tmasc contains written reports of some 24 thousand of these experiences, in a tidy tibble
ready for text mining, visual, and statistical analyses in R (or other software.)
library(tmasc) data(erowid) head(erowid)
One of the columns in the data frame is a list-column of data frames containing more detailed information about the substance(s) in each report, and can be used for more complex analyses. For the example here, we will drop this column for clarity. We'll also drop all the rows (70) where no report text exists.
erowid <- select(erowid, -dosechart) %>% filter(!is.na(text))
Many, but not all, of the reports contain a timestamp of the year of the experience.
erowid %>% ggplot(aes(x=year)) + geom_histogram(binwidth = 1, col = "white") + scale_x_continuous(breaks = seq(1960, 2010, 10)) + scale_y_continuous(expand = c(0,0))
erowid %>% group_by(gender, age) %>% ggplot(aes(x=age)) + geom_histogram(binwidth = 1, col="white", fill="black") + scale_y_continuous(expand = c(0, 0)) + facet_wrap("gender")
There are r length(unique(erowid$substance))
substances (or combinations).
erowid %>% filter(!str_detect(substance, "&")) %>% group_by(substance) %>% count() %>% filter(n > 100) %>% ggplot(aes(x=reorder(substance, n), y = n)) + geom_segment(aes(yend=0, xend = reorder(substance, n))) + geom_point() + coord_flip() + scale_y_continuous(limits = c(0, 800), expand = c(0, 0)) + labs(title = "Counts of substances", x = "", y = "", subtitle = "Substances with fewer than 100 reports excluded")
Let's also display a sample of the combinations:
erowid %>% filter(str_detect(substance, "&")) %>% group_by(substance) %>% count() %>% filter(n > 15) %>% ggplot(aes(x=reorder(substance, n), y = n)) + geom_segment(aes(yend=0, xend = reorder(substance, n))) + geom_point() + coord_flip() + scale_y_continuous(limits = c(0, 250), expand = c(0, 0)) + labs(title = "Counts of substance combinations", x = "", y = "", subtitle = "Combinations with fewer than 15 reports excluded")
For example, let's show the shortest report whose rating is "Highly Recommended":
erowid %>% filter(rating == "Highly Recommended") %>% mutate(n = nchar(text)) %>% arrange(n) %>% .[1,] %>% select(text) %>% as.character() %>% cat()
http://chemicalyouth.org/visualising-erowid/
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.