subset.humdrumR: Filter humdrum data

View source: R/Subset.R

subset.humdrumRR Documentation

Filter humdrum data

Description

HumdrumR defines subset() (base R) and filter() (tidyverse) methods for humdrumR data—these two .humdrumR methods are synonymous, working exactly the same. They are used to "filter" the contents of the underlying humdrum table. R's standard indexing operators (⁠[]⁠ and ⁠[[]]⁠) can also be used to filter data— you can read about these indexing options here—however, the subset()/filter() can accomplish much more sophisticated filtering commands than the indexing methods.

Filtering with subset()/filter() is (by default) not destructive, allowing you to recover the filtered data using removeSubset() or unfilter() (which are also synonyms).

Usage

## S3 method for class 'humdrumR'
subset(x, ..., dataTypes = "D", .by = NULL, removeEmptyPieces = TRUE)

## S3 method for class 'humdrumR'
filter(.data, ..., dataTypes = "D", .by = NULL, removeEmptyPieces = TRUE)

removeEmptyFiles(x)

removeEmptyPieces(x)

removeEmptySpines(x)

removeEmptyPaths(x)

removeEmptyRecords(x)

removeEmptyStops(x)

removeSubset(humdrumR, fields = dataFields(humdrumR), complement = NULL)

unfilter(humdrumR, fields = dataFields(humdrumR), complement = NULL)

complement(humdrumR, fields = dataFields(humdrumR))

Arguments

x, .data, humdrumR

HumdrumR data.

Must be a humdrumR data object.

...

Arbitrary expressions passed to with(in).

The "within" expression(s) must evaluate to either scalar or full-length logical values.

dataTypes

Which types of humdrum records to include.

Defaults to "D".

Must be a single character string. Legal values are ⁠'G', 'L', 'I', 'M', 'D', 'd'⁠ or any combination of these (e.g., "LIM"). (See the humdrum table documentation Fields section for explanation.)

.by

Optional grouping fields; an alternative to using group_by().

Defaults to NULL.

Must be NULL, or character strings which partially match one or more fields() in the data.

If not NULL, these fields are used to group the data. If grouping fields have already been set by a call to group_by(), the .by argument overrides them.

removeEmptyPieces

Should empty pieces be removed?

Defaults to TRUE.

Must be a singleton logical value: an on/off switch.

fields

Which fields to unfilter or complement?

Defaults to all data fields in the humdrumR data.

Must be character strings, partially matching data field in the input data.

complement

Which field to use as the subset complement to restore?

By default NULL, which means each data field's original complement is used.

Must be a single character string, partially matching a field in the input data.

Details

subset() and filter() are passed one or more expressions which are using the fields of the humdrum table using a call to within. This evaluation can thus include all of within.humdrumR()'s functionality (and arguments) including group-apply. The only requirement is that the expressions/functions fed to subset()/filter() must be return a logical (TRUE/FALSE) vector (NA values are treated as FALSE). The returned vector must either be scalar (length 1), or be the same length as the input data (the number of rows in the humdrum table). If the logical result is scalar, it will be recycled to match the input length: this is useful in combination with group_by(); for example, you can split the data into groups, then return a single TRUE or FALSE for each group, causing the whole group to be filtered or not.

Note that subset()/filter() are incompatible with contextual windows; if your data has contextual windows defined, they will be removed (with a warning message) before filtering.

Nullifying data

When using subset()/filter(), humdrumR doesn't actually delete the data you filter out. Instead, what these functions do is set all filtered data fields to NA (null) values, and changing their data type to "d". This ensures that the humdrum-syntax of the data is not broken by filtering! Thus, when you print a filtered humdrumR object you'll see all the filtered data points turned to null data (.). Since, most humdrumR functions ignore null data (d) by default, the data is effectively filtered out for most practical purposes. However, if you need to use those null ('d') data points (like, with ditto()), they can be accessed by setting dataTypes = 'Dd' in many functions. See the ditto() documentation for examples.

Truly removing data

In many cases, filtering out large parts of your data leaves a bunch of empty null data points (".") in your printout...which maybe be difficult to read. If you want to actually remove these filtered data points, you can call removeEmptyFiles(), removeEmptyPieces(), removeEmptySpines(), removeEmptyPaths(), removeEmptyRecords(), or removeEmptyStops(). These functions will safely remove null data without breaking the humdrum syntax; They do this by going through each piece/spine/path/record and checking if all the data in that region is null; if, and only if, all the data is null, that portion of data will be removed.

By default, subset.humdrumR() automatically calls removeEmptyPieces() before returning. However, you can stop this by specifying removeEmptyPieces = FALSE.

Renumbering

If filtered pieces, files, or spines are removed from a corpus (using removeEmptyPieces() or removeEmptySpines()) the File, Piece, Record and/or Spine fields are renumbered to represented the remaining regions, starting from 1. For example, if you have a corpus of 10 pieces and remove the first piece (Piece == 1), the remaining pieces are renumbered from 2:10 to 1:9. Spine/record renumbering works the same, except it is done independently within each piece.

Complements (unfiltering)

When subset() is applied, humdrumR stores the complement of the subset of each data field is retained (unless an explicit removeEmpty...() function is called). The removeSubset() or unfilter() functions can be used to restore the original data, by combining the subset with the complement. The fields argument can be used to control which data fields are unfiltered—by default, all data fields are unfiltered.

Normally, each data field is restored with its own complement data. However, the complement argument can be used to specify an field to use as the complement. This allows you to, for instance, different parts of separate fields into a single field.

The complement() function will directly swap the data-field subsets with their complements.

See Also

The indexing operators ⁠[]⁠ and ⁠[[]]⁠ can be used as shortcuts for common subset calls.

Examples


humData <- readHumdrum(humdrumRroot, "HumdrumData/BachChorales/chor00[1-4].krn")

# remove spine 1 (non destructive)
humData |> subset(Spine > 1)

# remove spine 1 (destructive)
humData |> subset(Spine > 1) |> removeEmptySpines()

# remove odd numbered bars

humData |> group_by(Bar) |> subset(Bar[1] %% 2 == 1)

# unfiltering and complement

humData |> filter(Spine %in% 1:2) |> complement()

humData |> filter(Spine %in% 1:2) |> unfilter()

humData |> filter(Spine %in% 1:2) |> solfa() |> unfilter(complement = 'Token')


Computational-Cognitive-Musicology-Lab/humdrumR documentation built on Oct. 22, 2024, 9:28 a.m.