The goal of musicMetadata
is to ease the processing of metadata frequently encountered in "music" datasets such as those obtained through the Spotify Web API, Chartmetric, Musicbrainz, or Discogs.
Classify clear-text label names as frequently encountered in digital music data sets (e.g., "Interscope"), into their parent labels (at this stage, the three major music labels "Universal", "Warner", and "Sony"). All remaining (unclassified) ones are likely independent music labels.
The classification algorithm relies on a list of regular expressions that identifies each of the three major music labels.
The algorithm has been developed on the basis of Aguiar, Waldfogel, and Waldfogel (2021). The expressions have further been updated enriched using an external data set with label names and their parent-label classifications, supplied by Chartmetric.
You can install the released version of musiclabels from GitHub with:
install.packages("devtools")
devtools::install_github("hannesdatta/musicMetadata")
This is a basic example which shows you how to solve a common problem:
library(musicMetadata)
# Classify single labels
classify_label('Interscope')
# Classify vector of labels
labels <- c('300 Entertainment/Atlantic', 'Bad Boy Records', 'Virgin Records Ltd')
data.frame(label=labels, parent_label = classify_labels(labels, concatenate = T))
Thanks to Robbert Oudelaar for debugging and code improvements, and to Chartmetric for providing validation data and feedback.
Aguiar, Luis, Joel Waldfogel, and Sarah Waldfogel (2021), "Playlisting favorites: Measuring platform bias in the music industry," International Journal of Industrial Organization (78): 102765. https://doi.org/10.1016/j.ijindorg.2021.102765
For a general guide on how to contribute to this repository/package, see https://tilburgsciencehub.com/learn/git-collaborate/.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.