knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

I started to write this vignette a while ago, before I knew object-oriented programming (OOP) in R. So this might be interesting for you if you don't know OOP but want to learn more about all the internals of pyramidi. If you just want to see some use cases or if you know well R6, the other vignettes might be a better place to start.

Load libraries

First load some libraries:

pyramidi::install_miditapyr(envname = "r-reticulate")
library(pyramidi)
library(dplyr)
library(tidyr)
library(purrr)
library(ggplot2)
library(zeallot)

Extract midi into dataframe

We'll extract the information of a midi file into dataframe. We'll use the package internal midi file:

midi_file_str <- system.file("extdata", "test_midi_file.mid", package = "pyramidi")

midifile <- mido$MidiFile(midi_file_str)
ticks_per_beat <- midifile$ticks_per_beat

Now we can load the information of the midifile into a dataframe:

dfc = miditapyr$frame_midi(midifile)
head(dfc, 20)

This dataframe contains the columns of the track index i_track, meta (whether the midi event is a note event), and msg containing named lists of further midi event information.

The MidiFile() function of mido also yields the ticks_per_beat of the file:

ticks_per_beat

The miditapyr$unnest_midi() function transforms the msg column of the dataframe to a wide format, where every new column name corresponds to the names in the lists in msg (like tidyr::unnest_wider()):

df <- miditapyr$unnest_midi(dfc) %>% as_tibble()
head(df, 20)

Except the name column this seems to be the same as

dfc %>% unnest_wider(msg)

Translate midi time information

In the midi format, time is treated as relative increments between events (measured in ticks). In order to derive the total time passed, you can use the function tab_measures():

dfm <- tab_measures(df, ticks_per_beat, c("m", "b")) %>%
  # create a variable `track` with the track name (in order to have it in the plot below)
  mutate(track = ifelse(purrr::map_chr(name, typeof) != "character", 
                        list(NA_character_), 
                        name)) %>%
  unnest(cols = track) %>% 
  fill(track)

dfm

This function adds further columns:

Further processing of the midi events

You can split the dataframe in two by whether the events are meta or not:

dfm %>% 
    miditapyr$split_df() %->% c(df_meta, df_notes)
df_meta %>% as_tibble()
df_notes %>% as_tibble()

Pivot note dataframe to wide

Each note in the midi file is characterized by a note_on and a note_off event. In order to generate a piano roll plot with ggplot2, we need to tidyr::pivot_wider() those events. This can be done with the function pivot_wide_notes():

df_not_notes <- 
  df_notes %>% 
  dplyr::filter(!stringr::str_detect(type, "^note_o[nf]f?$")) 

df_notes_wide <-
  df_notes %>% 
  dplyr::filter(stringr::str_detect(type, "^note_o[nf]f?$")) %>%
  # tab_measures(df_meta, df_notes, ticks_per_beat) %>%
  pivot_wide_notes() %>%
  left_join(pyramidi::midi_defs)
df_notes_wide

In the new format, the data has half the number of rows. The columns m, b, t, ticks, time and velocity are each replaced by two columns with the suffix _note_on and _note_off.

Plot midi information in piano roll plot

Now we have the midi data in the right format for the piano roll plot:

df_notes_wide %>%
  ggplot() +
  geom_segment(
    aes(
      x = m_note_on,
      y = note_name,
      xend = m_note_off,
      yend = note_name,
      color = velocity_note_on
    )
  ) +
  # each midi track is printed into its own facet:
  facet_wrap( ~ track,
              ncol = 1,
              scales = "free_y") + 
  guides(color=guide_colorbar(title="Note velocity")) +
  labs(
    title = "Piano roll of the note events in the midi file",
    subtitle = "Only notes played are shown."
  ) +
  xlab("Measures") +
  scale_x_continuous(breaks = seq(0, 16, 4),
                     minor_breaks = 0:16) +
  scale_colour_gradient() +
  theme_minimal()

Manipulation of the midi data

The new format also allows to easily manipulate the midi data. For instance, let's put the volume (called velocity in midi) of the first beat in every bar to the maximum (127), and to half of its original value otherwise:

df_notes_wide_mod <- df_notes_wide %>% 
  mutate(
    velocity_note_on = ifelse(
      # As it's a 4/4 beat, the first beat of each bar is a multiple of 4:
      b_note_on %% 4 == 0, 
      127, 
      velocity_note_on / 2
    )
)

Let's compare the modified value to the original one:

df_notes_wide %>% 
  select(b_note_on, velocity_note_on) %>% 
  bind_cols(
    new = df_notes_wide_mod$velocity_note_on
  )

With an ifelse() statement, we modified the volume of the midi notes, depending on if they're the first beat in the measure or not.

Other possible manipulations could be for instance:

Pivot note data frame back to long format

We can transform the wide midi data back to the long format:

df_notes_long <- pivot_long_notes(df_notes_wide)

Join non note events

We can now add the non note events:

df_midi_out <- merge_midi_frames(df_meta, df_notes_long, df_not_notes)

df_midi_out

The time value in midi format is given by the number of ticks passed between events.

Write midi dataframe back to a midi file

Now we can transform the data back to a dataframe of the same format as the one we got with miditapyr$frame_midi():

dfc2 <-
  df_midi_out %>%
  # When reticulate converts R dataframes to pandas, there are complications
  # with character columns containing missing values.
  # repair_reticulate_conversion = TRUE, repairs that in the miditapyr python
  # code:
  miditapyr$nest_midi(repair_reticulate_conversion = TRUE)
as_tibble(dfc2)

And we can save it back to a midi file:

miditapyr$write_midi(dfc2, ticks_per_beat, "test.mid")


urswilke/pyramidi documentation built on March 7, 2024, 3:48 p.m.