knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4, fig.align = "center" )
library(knitr) library(stringr) library(rGenius) library(lubridate) library(tidyverse)
Genius is a website that allows users to provide annotations and interpretation of song lyrics, news stories, sources, poetry, and other documents.
This R package, rGenius
, wraps the Genius API, facilitating the extraction of some rather interesting data. This vignette is intended to give a short, but nevertheless thorough, overview of how to use this software.
Before we proceed any further, it should be noted that while the Genius API is free to use, it does require that you first obtain an API Key. To do so (as noted in this projects README.md), you can go to this link: genius.com/api-clients/new, and fill out the form. You only need to provide the name of your app and an app website (the app website can even be https://example.com). After saving, you can get the access token by clicking on "Generate Access Token".
If you'd like follow along, this tutorial assumes that you have saved your access token (a character string)
to an objected entitled access_token
. That is,
access_token <- "YOUR ACCESS TOKEN HERE"
Alternatively, you can create a simple access_token.txt
file that contains your access token
and place it somewhere safe. Below you'll find an example:
access_token <- read_lines("../access_token.txt")[1]
With those technicalities out of the way, let's jump in.
To begin, we can examine the most elemental, but perhaps most important, function exposed by rGenius
: get_song()
. This function allows us to harvest information from the Genius API about a song
by it's ID number.
get_song()
Parameters:
song_id
: ID of the song from Genius API
access_token
: access_token from the Genius API
Return:
data.frame
complete with the following columns:Below we'll explore how we can learn song IDs for music we're interested in. For now, we can simply pick a number we like -- how about 12?
song_12_data <- get_song(song_id=12, access_token=access_token)
Let's explore what we've got here.
(class(song_12_data)) (dim(song_12_data))
As promised, we've obtained a DataFrame, with one row (Song ID #12). Let's print what it contains:
song_12_data %>% ## Make the `'featured_artists'` column print-friendly. mutate(featured_artists=paste(featured_artists, collapse = "")) %>% ## Transpose the DataFrame gather("Column Name", "Value") %>% kable()
The table above shows the transpose of the song_12_data
.
It has be transposed for one, and only one, reason: so that
it fits on the page! Leaving that inconvenience aside, there
is something that should be explained here.
Namely, you'll notice that 'featured_artists'
is actually a list()
.
This is for programming convenience, allowing this element in the dataframe
to contain many artists. The alternative here would have been to collapse
this information into a single character string which, while easier on the
eyes requires some extra work to manipulate programmatically.
get_songs()
If you have a lot of song IDs you'd like to harvest,
doing this one-by-one with get_song()
can be tedious.
Thus, we built get_songs()
, which can accept multiple song IDs
in the form of a vect
Parameters:
song_ids
: song IDs from the Genius API
access_token
: access_token
from the Genius API
verbose
: show progress bar if TRUE
songs_12_to_14 <- get_songs(c(12, 13, 14), access_token=access_token, verbose=TRUE)
(dim(songs_12_to_14))
Here we can see that the number of columns we've obtained is the same as above,
but we now have three rows: one for each song ID padded to get_songs()
. Apart from this,
this data frame is essentially identical to what we saw above.
search_song()
So a major point of frustration above is that we had to know the song IDs.
For this reason, we build search_song()
, which allows us to search for song
names in a human-friendly way.
Parameters:
song_title
: Name of the song
access_token
: access_token from the Genius API
n_per_page
: Number of returned result page
verbose
: show progress bar if TRUE
Return: This returns all of the same columns as get_song()
and get_songs()
.
Let's suppose we want to search for five song songs with "Dance" in their title. We can run this search as follows:
query_df <- search_song( song_title="Dance", ## query access_token=access_token, n_per_page=5, ## ask for five search results for our query. verbose=TRUE )
In the interest of formatting (i.e., page width), we can select a few salient columns from this result and print them:
query_df %>% ## Subset of key columns. select(id, title, artist, date, views) %>% kable()
get_artist()
rGenius
also exposes functions to collect data about artists directly, such
as aliases they use (if any), their ID in the Genius database as well as their user
names on social media platforms.
Parameters:
artist_id
: ID from API Genius of the artist.
access_token
: access_token from the Genius API.
Return:
'id'
: the artist's ID on the Genius API'alternate_name'
: the artist's name'facebook_name'
: the artist's name on facebook, if provided on the API'instagram_name'
: the artist's name on instagram, if provided on the API'twitter_name'
: the artist's name on twitter, if provided on the APIget_artist(1, access_token=access_token) %>% select(id:twitter_name) %>% kable()
get_song_from_artists()
Finally, you may wish to search for song from a given artist.
rGenius
makes this straightforward to do with get_song_from_artists()
.
Parameters:
artist_name
: Artist name
access_token
: access_token from the Genius API
n_per_page
: Number of returned result page
verbose
: show progress bar if TRUE
Return: the same columns as get_song()
and get_songs()
.
get_song_from_artists( artist_name="drake", access_token=access_token, n_per_page=5, ## ask for five songs by Drake. verbose=TRUE) %>% ## Again, take a subset of key columns. select(id, title, artist, date, views) %>% ## Pretty Print the Result knitr::kable()
Above we've explored the basics of how to use rGenius
.
Here we'll take a bigger step and use this package to build
a visualization of Katy Perry album popularity over time.
To begin, we can run a search for some songs by Katy Perry.
katy <- get_song_from_artists( artist_name="katy%20perry", access_token=access_token, n_per_page=50, verbose=TRUE) %>% ## `get_song_from_artists()` faithfully returns us the results ## from the Genius API. We'll have to filter The data to ensure ## we only get songs where the *main* artist was Katy Pery. filter(artist=="Katy Perry")
Next, we can clean up the data types to meet our plotting needs and shorten some one of the album names so they fit neatly on one our graph.
katy <- katy %>% mutate( date = as.Date(date), album = str_trim(as.character(album)) )
In our last step before plotting, let's say that we're only interested in two of her albums (our collective favorites): Prism and Witness.
katy_p_favs <- c("Prism", "Witness") katy <- katy %>% filter(album %in% katy_p_favs) %>% mutate(album = as.factor(album))
Now, at last, let's build our plot and take a look:
katy %>% mutate(views=views/1e6) %>% ggplot(aes(x=date, y=views, color=album)) + geom_line() + facet_grid(~album, scales="free_x") + scale_y_continuous(labels = scales::comma) + ## Standardize Date Format. ## See: https://stackoverflow.com/a/48510303 scale_x_date(date_labels = "%b-%Y") + labs(x="\nDate", y="Views (Millions)") + ggtitle("Our Favorite Katy Perry Albums", subtitle="Rise and Fall") + theme_minimal() + theme(legend.position="none")
In this vignette we've explored the rGenius
package for R, which wraps the Genius API. Specifically, we have walked through most of the major
use cases and have even used this package to generate a visualization for a contemporary pop artist.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.