source('vignette_header.R')
Welcome to "Shaping humdrum data"!
This article explains various steps you can, and often need, to take to prepare humdrum datasets for analysis, using r hm
.
One of the great strengths of the humdrum syntax is its flexibility---there are lots of ways you can structure your data to conveniently represent musical information. However, when it comes time analyze data, we typically want to "reshape" our data into a particular format that is ideal for analysis. The key idea is this: given our data and our particular research question, we must determine/decide what constitutes a single data observation. We want each data observation, or data point, to correspond to one row in our humdrum table. Depending on how your humdrum data files are organized, this might not be the case when you first load your data.
In order to shape our data, there are three steps we might need to do to our data:
Consider the following small humdrum file, which is bundled with r hm
.
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum') example
This file contains (at least) seven different pieces of information!
There are two voices, alto and soprano; Each voice has its rhythm and pitch information encoded in its **kern
spine, plus lyrical information in the next **silbe
spine.
In addition, the **harm
spine indicates the harmony accompanying both the vocal parts.
Now, depending on what sorts of analyses we want to perform, we need to decide what combinations of these seven information streams consitute a "data observation."
The first step is often to simply remove data we don't need.
In this article, we'll show you the most common, basic, ways you might filter your data.
For more details about other r hm
filtering functionality, check out
the data filtering article.
For example, if we are studying tonality, we might simply want to ignore the lyric data. These easiest way to do this is to index out spines we don't want, either using numeric indices
example[[ , c(1,3,5)]]
or (probably better) by exclusive interpretation:
example[[ , c('**kern', '**harm')]]
In this file, the **kern
spines in our example file include rhythmic data (**recip
) and pitch data.
If we are "just" studying tonality, we could extract the pitch information from the Token
field, and save it to a new field.
For example, we can use the kern()
function to extract the pitch information from Token
, and put it in a new field, which we'll call Pitch
.
example |> mutate(Pitch = kern(Token)) -> example example
Having now isolated the pitch information, we can analyze it.
(Of course, the rhythm information is still present, in the original Token
field.)
Of course, there are many other filtering options you might want to do, depending on your research goals.
Perhaps you only want to study pieces/passages in a flat keys, or only works from a particular time period, etc.
A common example would be limiting our rhythmic analyses to music in 4/4 time, and perhaps ignoring pickup notes.
The TimeSignature
field indicates the time signature, and the Bar
field can be used as an indicator of pickups at the beginning of pieces, as they are marked Bar == 0
.
So for example (using the Bach chorale data):
chorales <- readHumdrum(humdrumRroot, 'HumdrumData/BachChorales/.*krn') chorales |> filter(Bar > 0 & TimeSignature == 'M4/4') -> chorales chorales
Humdrum data often packs multiple pieces of information into compact, concise, readable tokens.
The classic example, or course, is **kern
itself which often includes rhythm, pitch, phrasing, beaming, and pitch ornamentation information!
These tokens are great for reading/writing, but not for analyzing, so we typically want to separate the information we do want.
As we've seen, the **kern
spines in our example file include rhythmic data (**recip
) and pitch data.
In some cases, we might want to access both pieces of information, but separately.
We can separate them by applying different functions to the Token
field, and saving the output to new fields.
For example,
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum') example |> mutate(Rhythm = recip(Token), Pitch = kern(Token)) -> example example
Hey, the printout kind of looks the same...but if you look at the bottom you'll see that there are now separte Pitch
and Rhythm
fields.
Since both fields are automatically selected by mutate()
, we are seeing them both.
But we could, for example, select one or the other field,
example |> select(Pitch) example |> select(Rhythm)
or run commands using them. For example, we could tabulate pitches wherever the rhythmic duration is a quarter note:
example |> filter(Rhythm == '4') |> count(Pitch)
This is only possible because we separated the information which was originally "pasted" together in the **kern
tokens.
What if actually want to move the rhythm and pitch information to separate spines---i.e., into their rows in the humdrum table?
Use rend()
:
example |> rend(Rhythm, Pitch)
The next step might be to align/combine information that is currently separated.
In many r hm
datasets, we have multiple pieces of information spread across multiple spines, or in some cases, across spine paths or stops.
If, given our research question, we need to think of multiple pieces of information as describing a single data point, we'll need to reshape the data.
For example, in our example
file the **silbe
(lyric) spines associate each syllable with exactly one note in the adjacent **kern
spines.
Currently, by default, each **kern
token and each **silbe
token are in their own, separate row of the humdrum table.
We want them in the same row.
In the humdrum syntax view, this means moving the **silbe
data into the same location as the **kern
tokens.
To take separate spines---like **kern
and **silbe
---and paste them together, we use cleave()
; as in "to cleave together."
Simply run cleave, and tell it which spines to cleave together.
In our example
file, the 1st and 3rd spines are **kern
while the 2nd and 4th spines are **silbe
, so we can tell cleave
to cleave together 1:2
and 3:4
:
In our example, we want to align the notes in the **kern
spines with the syllables in the **silbe
spine.
We can do this directly using cleave()
: use the fold
argument to indicate which spine to fold, and the onto
argument to indicate which spine to move it onto.
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum') example |> cleave(1:2, 3:4)
It worked! The 1st and 3rd spines have dissappeared, with their content now in the 1st and 3rd spines.
But notice that the cleave put some data into a new field, which it has called Spine2|4
.
We can improve this name using the newFields
argument:
example |> cleave(1:2, 3:4, newFields = 'Silbe') -> example example
But, wait, where did the **kern
and **sible
data go exactly?
Well, since the **kern
spines were the first spines we indicated to cleave()
, that data stays in the original Token
field.
The other spines---which in this case are the **sible
data---are put into the new field(s).
See:
example |> select(Token) example |> select(Silbe)
Notice that the fifth spine, which wasn't part of the cleave, is also left untouched in the Token
field.
In some datasets, there might be different numbers of **kern
/**silbe
spines in different files within the dataset.
In these cases, it will be much smarter to pass cleave()
the names of the exclusive intepretations we want to cleave.
We should get the same result we got manually before:
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum') example |> cleave(c('kern', 'silbe'))
What if we want to study the note combinations that are formed between the two **kern
spines?
As they are, this would be hard because the tokens from each spine are all in their own rows.
However, we could cleave()
together the two kern spines!
(We'll also isolate just the **kern
spines for this.)
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum') example |> index2( , '**kern') |> kern(simple = TRUE) |> cleave(c(1, 2)) |> count()
What about that **harm
spine in our data.
Unlike the **silbe
spines, the **harm
spine is not "paired up" with the **kern
spines.
Rather, the **harm
actually describes what is happening in the entire record of data.
The chords indicated in this spine are associated with the pitches in both, or either, of the **kern
spines.
Luckily, cleave()
will handle this:
example <- readHumdrum(humdrumRroot, 'examples/Reshaping_example.hum') example |> cleave(c('kern', 'harm'))
Data from the **harm
spine is copied twice into a new field "on top of" both **kern
spines!
Though spines are the most common structure in humdrum data that you might need to "cleave" together, we can also cleave other structures. Of course, it depends on what questions you are trying to ask of your data!
Consider this example, with multi-stop chords in a **kern
spine:
example_stops <- readHumdrum(humdrumRroot, 'examples/Reshaping_example2_stops.hum') example_stops
By default, r hm
treats each token (note) in each stop as a separate data observation, with its own row in the humdrum table.
If we are studying harmony, we might want to align those stops "on top" of each other, in different fields.
But, we can cleave the stops together, putting them each into their own field!
example_stops |> cleave(Stop = 1:3)
The first stop (Stop == 1
) is left in the Token
field, but we get new Stop2
and Stop3
fields holding the other stops!
When working with spine paths, it is often less obvious how we should interpret different paths in terms of data observations, but if you want to cleave them, you can.
Don't forget that Path
is numbered starting from 0
.
example_paths <- readHumdrum(humdrumRroot, 'examples/Reshaping_example3_paths.hum') example_paths |> cleave(Path = 0:1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.