indexHumdrum | R Documentation |
R's built-in indexing operators, []
(single brakcets) and [[]]
(double brackets) can
be used to filter humdrumR data, by removing specific
pieces, spines, or records from the humdrum table.
Unlike the more flexible/powerful subset()/filter() methods,
the indexing operators are generally destructive (by default), meaning filtered data can no longer
be accessed after indexing.
The functions index()
and index2()
are synonyms for single and double brackets respectively,
which can be used in pipes.
humData[] # returns unchanged
humData[x:y]
humData['regex']
humData[[x:y]]
humData[[ , x:y]]
humData[['regex']]
humData[[ , 'regex']]
humData[[x:y, l:m]]
humData[[ , , regex]]
index(x, i, j, drop = TRUE)
index2(x, i, j, drop = TRUE)
x |
HumdrumR data to index. Must be a humdrumR data object. |
i |
Index for vectors or matrix/data.frame rows. A numeric vector or a |
drop |
Should empty records/spines/pieces be removed? Defaults to Must be a singleton |
j |
Index for matrix/data.frame columns. A numeric vector or a |
In R, the fundamental indexing operators, []
and [[]]
,
are used to select subsets of data.
For many data types (for instance, base R lists)
the [
single brackets]
are used for "shallower" extraction while the
[[
double brackets]]
are used for "deeper" extraction.
By rough analogy with this "shallow vs deep" dichotomy, HumdrumR corpus
indexing brackets are used in two ways:
[
single brackets]
are used to select pieces in your data.
[[
double brackets]]
are used to select records or spines within the pieces in your data.
(Accidentally writing []
when you need
[[]]
is a very common error, so watch out!)
Whether, indexing by piece or within, humdrumR
objects can use
two types of indexing arguments: numeric
(ordinal integers) or character
string
(interpreted as regular expressions).
Indexing humdrumR
corpora with
[
single brackets]
will accept
one numeric argument—only whole numbers are accepted.
This argument will be used to pick pieces within the humdrumR
object ordinally.
Thus, humData[1:10]
will select the first ten pieces in the data while humData[42]
will select only the 42nd piece.
Indexing humdrumR
objects with
[[
double brackets]]
will accept
one or two numeric arguments, i
and j
, either of which can
be used in isolation or in combination.
(If j
is used in isolation, it must be named or placed after a comma, as in humData[[ , j ]]
.)
i
is used to index records (i.e., based on the humtable Record
field).
Thus, humData[[1:20]]
indexes the first twenty records from each piece
in the corpus, and humData[[42]]
extracts the 42nd record from each piece.
To avoid breaking the humdrum syntax, exclusive interpretations and spine-path interpretations are not removed.
j
is used to index spines (i.e., based on the Spine
field).
Thus, humData[[ , 3:4]]
returns the third and fourth spines from each
piece in the corpus.
Pieces/spines/records are renumbered after indexing
(see the Renumbering section of the subset()/filter() docs for explantion).
As a result, humdrumR
indexing is entirely ordinal.
For example,
humsubset <- humData[11:20] humsubset[2]
will return the 12th piece from the original humData
object.
This is because the first call to []
returns the 11th through 20th pieces, which
are renumbered 1:10
and the second index call returns the new 2nd index, which was the 12th
originally.
Similarly,
humsubset2 <- humData[[ , 2:4]] humsubset2[[ , 2]]
will return the third spine from the original data.
As in normal R
indexing, negative numbers can be used, causing corresponding elements to be
removed instead of retained. Thus, humData[-3:-5]
will remove the third, fourth, and fifth pieces from the data
while humData[[ , -3:-5]]
will remove the third, fourth, and fifth spines from each piece.
Positive and negative indices cannot be mixed in a single argument.
In all cases, indices outside of bounds (or of value 0
) are ignored.
E.g., if you have a corpus of twenty pieces and you call corpus[21]
, there is no 21st piece, so 21
is "out of bounds".
If all your input indices are 0
and error will result.
If all your input indices are out of bounds then
an empty humdrumR
object is returned.
For instance, humData[[401:500, ]]
will return an empty
humdrumR
object if there are no pieces with more than 400
data records.
If you index a humdrumR object
with character
strings, these strings are
treated as regular expressions (regexes),
which are matched against non-null data tokens ("D"
) in the object's first selected field.
A match to any of the regular expressions considered a match.
Indexing with [
single brackets]
accepts one
vector of character
regular expressions.
Any piece that contains even a single match will be retained.
If no matches occur in any pieces, an empty humdrumR
object is returned.
Indexing humdrumR
objects with [[
double brackets]]
accepts one or two vectors of character
strings, i
and j
,
either of which can be used in isolation or in combination.
(If j
is used in isolation, it must be placed after a comma,
as in humData[[ , j]]
.)
Any data record which contains at least one match to the i
regex(es)
will be retained.
Similarly, any spine which contains at least one match to the
j
regex(es) is retained.
If i
and j
are used together,
matching spines (j
) are indexed first, so that
tokens matching the regular expression(s) in i
must be found in the matching spines.
Spines can also be indexed ordinally by exclusive interpretation.
To do this, provide a double-bracket index with a named numeric (whole number) argument,
with name(s) corresponding to exclusive interpretations in the data.
For example, if you want to index the 3rd **kern
spine in each piece,
use humData[[kern = 3]]
.
Note that other exclusive interpretations in each piece are unaffected—in
this example, only the kern spines (if there are any) are indexed!
The removeEmpty
argument to any humdrumR indexing controls whether
filtered data is completely removed from the data, or simply set to null
This means the filtered data can be recovered using unfilter()
(see the subset()/filter()
docs for an explanation).
By default, piece-indexing and spine-indexing have removeEmpty = TRUE
,
but record-indexing defaults to removeEmpty = FALSE
.
For more powerful/flexible indexing options, use subset()/filter().
humData <- readHumdrum(humdrumRroot, "HumdrumData/RollingStoneCorpus/*.hum")
humData[1:2]
humData[-1]
humData[[ , 3:4]]
humData[[1:40 , ]]
# find all pieces which use a flat 3
humData['b3']
# find all records that use a flat 3
humData[['b3', ]]
humData[['b3', removeEmpty = TRUE]]
# Exclusive interpretation indexing
humData[[deg = 1]]
# pipe indexing
humData |> index(1:3) |> index2(3:4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.