withinHumdrum: Working _with_ humdrum data fields
In Computational-Cognitive-Musicology-Lab/humdrumR: humdrumR

withinHumdrum

R Documentation

Working with humdrum data fields

Description

These functions are the primary means of working with humdrumR data. They allow us to perform arbitrary (free form) manipulation of data fields held within a humdrumR data object, with convenient functionality for ignoring null data, lagging data, grouping data, windowing, and more. The with() and within() functions, which come from base R, are the core functions. However, the dplyr "verbs" mutate(), summarize(), and reframe() can be used as well—they are equivalent to using with()/within() with particular arguments.

Usage

## S3 method for class 'humdrumR'
with(
  data,
  ...,
  dataTypes = "D",
  recycle = "no",
  alignLeft = TRUE,
  expandPaths = FALSE,
  drop = TRUE,
  .by = NULL,
  variables = list()
)

## S3 method for class 'humdrumR'
within(
  data,
  ...,
  dataTypes = "D",
  alignLeft = TRUE,
  expandPaths = FALSE,
  recycle = "pad",
  .by = NULL,
  variables = list()
)

## S3 method for class 'humdrumR'
mutate(
  .data,
  ...,
  dataTypes = "D",
  recycle = "ifscalar",
  alignLeft = TRUE,
  expandPaths = FALSE,
  .by = NULL
)

## S3 method for class 'humdrumR'
summarise(
  .data,
  ...,
  dataTypes = "D",
  expandPaths = FALSE,
  drop = FALSE,
  .by = NULL
)

## S3 method for class 'humdrumR'
reframe(
  .data,
  ...,
  dataTypes = "D",
  alignLeft = TRUE,
  expandPaths = FALSE,
  recycle = "pad",
  .by = NULL
)

## S3 method for class 'humdrumR'
ggplot(data = NULL, mapping = aes(), ..., dataTypes = "D")

Arguments

`data`	HumdrumR data. Must be a humdrumR data object.
`...`	Any number of expressions to evaluate. These expressions can reference `fields()` in the data by name, as well as variables outside the data. If the expressions are named, the names are used to name the new fields (or column names for `with(..., drop = FALSE)`.
`dataTypes`	Which types of humdrum records to include. Defaults to `"D"`. Must be a single `character` string. Legal values are `⁠'G', 'L', 'I', 'M', 'D', 'd'⁠` or any combination of these (e.g., `"LIM"`). (See the humdrum table documentation Fields section for explanation.)
`recycle`	How should results be "recycled" (or padded) to relative to the input length? `within()` and `reframe()` default to `"pad"`; `mutate()` defaults to `"ifscalar"`; `with()` defaults to `"no"`. Must be a single `character` string. The full list of options are `"no"`, `"yes"`, `"pad"`, `"ifscalar"`, `"ifeven"`, `"never"`, and `"summarize"`, though not all functions accept all options. See the Parsing expression results section below.
`alignLeft`	Should output that is shorter than input be aligned to the left? Defaults to `TRUE`. Must be a singleton `logical` value: an on/off switch.
`expandPaths`	Should spine paths be expanded before evaluating expressions? Defaults to `FALSE`. Must be a singleton `logical` value: an on/off switch. If `TRUE`, the `expandPaths()` function is run on the data before evaluating the expressions. After evaluation, the expanded locations are removed from the output.
`drop`	Whether to return a simplified data structure. Defaults to `TRUE`. Must be a singleton `logical` value: an on/off switch. This argument is conceptually similar to the `drop` argument in R matrices. If `drop = TRUE`, the output of `with()`/`summarize()` is simplified as much as possible (trying to return the "raw" vector, list, table, etc. within it). If `drop = FALSE`, the result is always a data.table.
`.by`	Optional grouping fields; an alternative to using group_by(). Defaults to `NULL`. Must be `NULL`, or `character` strings which partially match one or more `fields()` in the `data`. If not `NULL`, these fields are used to group the data. If grouping fields have already been set by a call to group_by(), the `.by` argument overrides them.
`variables`	A named `list` of values, to interpolate into your expressions. Defaults to `list()`. Must be a named `list`. These values are interpolated into the `...` expression arguments wherever a variable name matches a name from the list.

Overview

These functions are the primary means of working with humdrumR data. They all allow you to write code that accesses and manipulates the raw fields() in our data. The main differences between them are what they do with the results of your code: with() and summarize() return results in normal, "raw" R formats, removed from the humdrumR data; In contrast, within(), mutate(), and reframe() always insert the results of your code into new fields() within your humdrum data. The other distinctions between these functions have to do with how they recycle/pad results (see below).

Expression evaluation

The with(), within(), mutate(), summarize(), and reframe() methods for humdrumR data all perform "non-standard evalation" of any expressions you provide them as arguments. Basically, when you use a function like with(...) or mutate(...), the expressions you write inside the function call aren't evaluated right then and there—instead, R takes those expressions into the "environment" of your humdrum table, where all your fields are "visible" to the expression. This means you can write code (expressions) that refer to your fields(), like Token or Spine. For example:

with(humData, 
     ifelse(Spine > 2, 
            kern(Token), 
            recip(Token)))

Since all the fields in a humdrum table are the same length, the expressions you write can be, and generally should be, vectorized.

By default, with(), within(), etc. don't use the whole humdrum table, but instead only evaluate their expressions using rows containing non-null data tokens (Type == "D"). This means that interpretations, comments, barlines, and null data tokens are automatically ignored for you! This feature is controlled by the dataTypes argument: you can choose to work with the other token types by providing a character string containing combinations of the characters G (global comments), L (local comments), I (interpretations), M (barlines), D (non-null data), or d (null data). For example, dataTypes = 'MDd' will evaluate your expressions on barline tokens (=), non-null data, and null data. See the ditto() manual for an example application of using dataTypes = 'Dd'. Keep in mind that humdrumR dynamically updates what tokens are considered "null" ("d") based on what fields are selected.

If multiple expression arguments are provided, each expression is evaluated in order, from left to right. Each expression can refer variables assigned in the previous expression (examples below).

Note: Within any of these expressions, the humdrumR namespace takes priority. This means that, for example, if you use lag() within an expression, the humdrumR version of lag() will be used, even if you have loaded other packages which have their own lag() function. To use another package's function, you'll have to specify ⁠package::function()⁠—for example, dplyr::lag(). This is only an issue when functions have the exact same name as a humdrumR function.

Expression pre-processing

These functions all do some pre-processing of expressions arguments before evaluating them. This pre-processing provides some convenient "syntactic sugar" for working with humdrum data. There are currently five pre-processing steps:

Explicit variable interpolation.
The . placeholder for selected fields.
Automatic argument insertion.
"Lagged"-vectors shorthand.
"Splatted" arguments.

Each of these is explained below.

Explicit variable interpolation

The variable argument can be provided as an (option) list of named values. If any of the names in the variable list appear as symbols (variable names) in any expression argument, their value is interpolated in place of that symbol. For example, in

within(humData, kern(Token, simple = x), variable(x = TRUE))

the variable x will be changed to TRUE, resulting in:

within(humData, kern(Token, simple = TRUE))

This feature is most useful for programmatic purposes, like if you'd like to run the same expression many times but with slightly different parameters.

The . placeholder

The . variable can be used as a special placeholder representing the data's first selected field. For example, in

humData |>
  select(Token) |>
  with(count(.))

will run count() on the Token field.

Because new fields created by within()/mutate()/reframe() become the selected fields (details below), the . makes it easy to refer to the last new field in pipes. For example, in

humData |>
   mutate(kern(Token, simple = TRUE)) |>
   with(count(.))

the count() function is run on the output of the mutate(kern(Token, simpe = TRUE)) expression.

Automatic argument insertion

Many humdrumR functions are designed to work with certain common fields in humdrumR data. For example, many pitch functions have a Key argument which (can) take the content of the Key which readHumdrum() creates when there are key interpretations, like ⁠*G:⁠, in the data. When an expression argument uses one of these functions, but doesn't explicitly set the argument, humdrumR will automatically insert the appropriate field into the call (if the field is present). So, for example, if you run

humData |> 
   mutate(Solfa = solfa(Token))

on a data set that includes a Key field, the expression will be changed to:

humData |> 
   mutate(Solfa = solfa(Token, Key = Key))

If you don't want this to happen, you need to explicitly give a different Key argument, like:

humData |> 
   mutate(Solfa = solfa(Token, Key = 'F:'))

(The Key argument can also be set to NULL).

Another common/important automatic argument insertion is for functions with a groupby argument. These functions will automatically have appropriate grouping fields inserted into them. For example, the mint() (melodic intervals) command will automatically by applied using groupby groupby = list(Piece, Spine, Path), which makes sure that melodic intervals are only calculated within spine paths...not between pieces/spines/paths (which wouldn't make sense!).

All humdrumR functions which use automatic argument interpolation will mention it in their own documentation. For example, the ?solfa documentation mentions the treatment of Key in its "Key" section.

Lagged vectors

In music analysis, we very often want to work with "lagged" vectors of data. For example, we want to look at the relationship between a vector and the previous values of the same vector—e.g., the vector offset or "lagged" by one index. The lag() and lead() functions are useful for this, always keeping them the same length so vectorization is never hindered.

In expression arguments, we can use a convenient shorthand to call lag() (or lead). In an expression, any vector can be indexed with an integer argument named lag or lead (case insensitive), causing it to be lagged/led by that integer amount. (A vector indexed with lag = 0 returns the unchanged vector.) For example, the following two calls are the same:

humData |> with(Token[lag = 1])
humData |> with(lag(Token, 1))

This is most useful if the lag/lead index has multiple values: if the indexed object appears within a higher function call, each lag is inserted as a separate argument to that call. Thus, these two calls are also the same:

humData |> with(count(Token[lag = 1:2]))
humData |> with(count(lag(Token, 1), lag(Token, 2)))

Note that the lagging will also be automatically grouped within the fields list(Piece, Spine, Path), which is the default "melodic" structure in most data. This assures that a vector is "lagged" from one piece to another, or from one spine to the next. If you'd like to turn this off or change the grouping, you need to override it by adding a groupby argument to the lagged index, like Token[lag = 1, groupby = list(...)].

Using lagged vectors, since they are vectorized, is the fastest (computationally) and easiest way of working with n-grams. For example, if you want to create character-string 5-grams of your data, you could call:

humData |> with(paste(Token[lag = 0:5], sep = '-'))

Since the lagging is grouped by list(Piece, Spine, Path), these are true "melodic" n-grams, only created within spine-paths within each piece.

Splatted arguments

"Splatting" refers to feeding a function a list/vector of arguments. Sometimes we want to divide our data into pieces (a l\'a group_by()), but rather than applying the same expression to each piece, we want to feed the separate pieces as separate arguments to the same function. You can use some syntactic sugar to do just this. We can index any field in our call with a splat argument, which must be a Field %in% x. For example,

humData |> with(list(Token[splat = Spine %in% 1:2]))

In this call, the Token field will be divided into two groups, one where Spine == 1 and the other where Spine == 2; the first group (Spine == 1) will be used as the first argument to list, and the second group (Spine == 2) as the second argument. Thus, within translates the previous expression to this:

humData |> within(list(Token[Spine == 1], Token[Spine == 2]))

Splatting can be little weird, because there is nothing to assure that the splatted arguments are all the same length, which we usually want (vectorization). For example, in the previous example, there is no guarantee that Token[Spine == 1] and Token[Spine == 2] are the same length. This just means we should only use splatting if we really understand the groups we are splatting. For example, if there are no spine paths or stops in our data, then we can know that all spines have the same number of data records, but only including all data records (null and non-null). So, if I know there are no stops/paths in our data, we can run something like this:

humData |> within(dataTypes = 'Dd', 
                  count(Token[splat = Spine %in% 1:2]))

Saving expressions for later

In some cases you may find that there are certain arguments expressions that you use repeatedly. You can store expressions as variables by "quoting" them: the most common way to quote an expression in R is using the ~, which creates what is called a "formula"—essentially a quoted expression. You can also quote expressions, using quote(). Once you've quoted an expression you can pass it to with(), within(), mutate(), summarize(), and reframe().

Image that you have three different datasets (humData1, humData2, and humData3), and you'd like to evaluate the expression count(kern(Token, simple = TRUE)) in all three. Use the ~ operator to quote and save that expression to variable, then use it with with():

countKern <- ~count(kern(Token, simple = TRUE))

humData1 |> with(countKern)
humData2 |> with(countKern)
humData3 |> with(countKern)

Expanding paths

For data that includes spine paths (which you can check with anyPaths()), some analyses may require that spine paths are treated as contiguous "melodies." The expandPaths() function can be used to "expand" spine paths into new spines. The expandPaths argument to with()/within() will cause expandPaths() to be run on your data before evaluating your argument expressions. After evaluation, the expanded parts of the data are then removed from the output.

Parsing expression results

The only differences between the with(), within(), mutate(), summarize(), and reframe() humdrumR methods are what they do with the results of expressions passed to them. The major difference is that within(), mutate(), and reframe() put results into new fields in a humdrumR data, while with() and summarize() just return their results in "normal" R. The other differences between the functions simply relate to how the recycle and drop arguments are used (details below).

The recycle argument controls how the results of your code are, or aren't, recycled (or padded). When you write code using your humdrumR data's fields() as input, your results are inspected to see how long they are compared to the length of the input field(s). If any of your results are longer than the input, you'll get an error message—humdrumR can't (yet) handle that case. If any of your results are shorter than the input, the recycle argument controls what happens to that result. There are seven options:

"no": The result is not recycled or padded. For calls to within(), mutate, or reframe(), this option is not allowed.
"yes": the result is recycled, no matter how long it is.
"pad": the result is padded with NA values.
"ifscalar": if the result is scalar (length 1), it is recycled; otherwise you see an error.
"ifeven": if the result length evenly divides the input length, it is recycled; otherwise you see an error.
"never": The result is not recycled. If the result does not match the input length, you see an error.
"summarize": if the result is not scalar, even if it matches the input length, you see an error. The result is not recycled.

The result of padding/recycling also depends on the alignLeft argument: If alignLeft = TRUE, results are padded to the right: like c(result, NA, NA, ...); If alignLeft = FALSE, results are padded on the left: like c(..., NA, NA, results). Recycling is also affected if the result's length does not evenly divide the input length. For example, consider a result c(1, 2, 3) which needs to be recycled to length 10: If alignLeft = TRUE, the result is recycled c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1); If alignLeft = FALSE, the result is recycled c(3, 1, 2, 3, 1, 2, 3, 1, 2, 3).

with() and summarize()

The humdrumR with() and summarize() methods return "normal" R data objects. The only difference between the with() and summarize() methods is their default drop and recycle arguments:

with(..., drop = TRUE, recycle = 'no')
summarize(..., drop = FALSE, recycle = 'summarize')

If drop = TRUE, these methods return whatever your code's result is, with no parsing. This can be any kind of R data, including vectors or objects like lm fits or tables. If drop = FALSE, the results will instead be returned in a data.table().

If you are working with grouped data, the drop = FALSE output (data.table) will include all grouping columns as well as the results of your expressions. If drop = TRUE and there is only one result per group, the grouping fields will be used to generate names for the output vector.

within(), mutate(), and reframe().

The humdrumR within(), mutate(), and reframe() methods always return a new humdrumR data object, with new fields created from your code results. The only differences between these methods is their default recycle argument and the types of recycle argument they allow:

within(..., recycle = 'pad')
- Can accept any recycle option except "no".
mutate(..., recycle = 'ifscalar')
- Can only accept "ifscalar" or "never".
reframe(..., recycle = 'pad')
- Can only accept "pad" or "yes".

Creating new humdrumR fields

When running within(), mutate(), or reframe(), new fields() are added to the output humdrumR data. These new fields become the selected fields in the output. You can explicitly name newly created fields (recommended), or allow humdrumR to automatically name them (details below). When using with(..., drop = FALSE) or summarize(..., drop = FALSE), the column names of the output data.table are determined in the same way.

Note that within(), mutate(), and reframe() will (attempt to) put any result back into your humdrumR data...even if it doesn't make much sense. Things will work well with vectors. Atomic vectors are usually the best to work with (i.e., numbers, character strings, or logical values), but lists will work well too—just remember that you'll need to treat those fields as lists (e.g., you might need to use lapply() or Map() to work with list fields.) Any non-vector result will be put into a list as well, padded as needed. For example, if you use lm() to compute a linear-regression in a call to within() the result will be a new field containing a list, with first element in the list being a single lm fit object, and the rest of the list empty (padded to the length of the field).

Naming new fields

If you don't explicitly name the code expressions you provide, the new fields are named by capturing the expression code itself as a character string. However, it is generally a better idea to explicitly name your new fields. This can be done in two ways:

Base-R within() style: Use the ⁠<-⁠ assignment operator inside your expression.
- Example: within(humData, Kern <- kern(Token)).
Tidyverse mutate() style: provide the expression as a named argument with =.
- Example: mutate(humData, Kern = kern(Token)).

Either style can be used with any of the humdrumR methods. When using ⁠<-⁠, only top-level assignment will create a new field, which means only one field can be assigned per expression. For example,

within(humData, 
       Semits <- semits(Token),
       Recip <- recip(Token))

will create two fields (Semits and Recip). However,

within(humData,
       { 
         Semits <- semits(Token)
         Recip <- recip(Token)
        })

will not. The result of expressions grouped by {} is always the last expression in the brackets. Thus, the last example above will only create one new field, corresponding to the result of recip(Token). However, the resulting field won't be called Recip! This is because only top-level assignments are used to name an expression: To name a multi-expression expression (using {}), you could do something like this:

within(humData,
       Recip <- { 
         Semits <- semits(Token)
         recip(Token)
        })

Of course, only the result of recip(Token) would be saved to Recip, so the Semits <- semits(Token) expression is doing nothing useful here.

Piped references

All argument expressions passed to the with()/within() methods are evaluated in order, from left to right, so any assignments in a previous expression will be visible to the next expression. This means we can, for example, do this:

within(humData, 
       Kern <- kern(Token),
       Kern2 <- paste0(Kern, nchar(Kern)))

the use of Kern in the second expression will refer to the Kern assigned in the previous expression.

Evaluating expressions in groups or windows

The with(), within(), mutate(), summarize(), and reframe() functions all work with grouped data, or data with contextual windows defined. When groups or windows are defined, all argument expressions are evaluated independently within each and every group/window. Results are then processed (including recycling/padding) within each group/window. Finally, the results are then pieced back together in locations corresponding to the original data locations. Since groups are necessarily exhaustive and non-overlapping, the results location are easy to understand. On the other hand contextual windows may overlap, which means and non-scalar results could potentially overlap as well; in these cases, which result data lands where may be hard to predict.

Examples



# with/within style:

humData <- readHumdrum(humdrumRroot, "HumdrumData/BachChorales/chor00[1-4].krn")

humData |> with(count(kern(Token, simple = TRUE), Spine))

humData |> within(Kern <- kern(Token), 
                  Recip <- recip(Token),
                  Semits <- semits(Token)) -> humData
                  
humData |> 
    group_by(Spine) |>
    with(mean(Semits))
    
humData |> 
    group_by(Piece, Spine) |>
    with(mean(Semits), drop = FALSE)
    
# tidyverse (dplyr) style:

humData <- readHumdrum(humdrumRroot, "HumdrumData/BachChorales/chor00[1-4].krn")

humData |> mutate(Kern = kern(Token), 
                  Recip = recip(Token),
                  Semits = semits(Token)) -> humData
                  
humData |> 
    group_by(Spine, Bar) |>
    summarize(mean(Semits))
      
# dataTypes argument

humData |>
   group_by(Piece, Spine) |>
   within(paste(Token, seq_along(Token)))
   
humData |>
   group_by(Piece, Spine) |>
   mutate(Enumerated = paste(Token, seq_along(Token)),
          dataTypes = 'Dd')
          
# recycle argument

humData |>
   group_by(Piece, Bar, Spine) |>
   mutate(BarMean = mean(Semits), recycle = 'ifscalar')
   
humData |>
   group_by(Piece, Bar, Spine) |>
   within(BarMean = mean(Semits), recycle = 'pad')

Computational-Cognitive-Musicology-Lab/humdrumR documentation built on Oct. 22, 2024, 9:28 a.m.

Computational-Cognitive-Musicology-Lab/humdrumR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Computational-Cognitive-Musicology-Lab/humdrumR
humdrumR

withinHumdrum: Working _with_ humdrum data fields
In Computational-Cognitive-Musicology-Lab/humdrumR: humdrumR

Working with humdrum data fields

Description

Usage

Arguments

Overview

Expression evaluation

Expression pre-processing

Explicit variable interpolation

The . placeholder

Automatic argument insertion

Lagged vectors

Splatted arguments

Saving expressions for later

Expanding paths

Parsing expression results

with() and summarize()

within(), mutate(), and reframe().

Creating new humdrumR fields

Naming new fields

Piped references

Evaluating expressions in groups or windows

See Also

Examples

Related to withinHumdrum in Computational-Cognitive-Musicology-Lab/humdrumR...

R Package Documentation

Browse R Packages

We want your feedback!

Computational-Cognitive-Musicology-Lab/humdrumR humdrumR

withinHumdrum: Working _with_ humdrum data fields In Computational-Cognitive-Musicology-Lab/humdrumR: humdrumR

Working with humdrum data fields

Description

Usage

Arguments

Overview

Expression evaluation

Expression pre-processing

Explicit variable interpolation

The . placeholder

Automatic argument insertion

Lagged vectors

Splatted arguments

Saving expressions for later

Expanding paths

Parsing expression results

with() and summarize()

within(), mutate(), and reframe().

Creating new humdrumR fields

Naming new fields

Piped references

Evaluating expressions in groups or windows

See Also

Examples

Related to withinHumdrum in Computational-Cognitive-Musicology-Lab/humdrumR...

R Package Documentation

Browse R Packages

We want your feedback!

Computational-Cognitive-Musicology-Lab/humdrumR
humdrumR

withinHumdrum: Working _with_ humdrum data fields
In Computational-Cognitive-Musicology-Lab/humdrumR: humdrumR