humTable | R Documentation |
In the humdrumR package, the fundamental data structure is called a humdrum table.
A humdrum table encodes all the information in a collection of one or more humdrum-syntax files
as a single data.table
(A data.table
is an "enhanced" version of R's standard data.frame).
Humdrum tables are stored "inside" every humdrumRclass object that you will work with, and various humdrumR
functions allow you to study or manipulate the them.
If you want to directly access the humdrum table within a humdrumRclass object, use the getHumtab()
function.
The getHumtab()
function extracts the humdrum table from a humdrumR object.
Use the fields()
function to list the current fields in
a humdrumRclass object.
getHumtab(humdrumR, dataTypes = "GLIMDd")
fields(
humdrumR,
fieldTypes = c("Data", "Structure", "Interpretation", "Formal", "Reference",
"Grouping", "selected")
)
## S3 method for class 'humdrumR'
names(humdrumR)
humdrumR |
HumdrumR data. Must be a humdrumR data object. |
dataTypes |
Which types of humdrum record(s) to include in the output. Defaults to Must be a |
fieldTypes |
Which types of fields to list. Shows all fields by default. Must be a |
In a humdrum table, by default, humdrum data is organized in a maximally "long" (or "tall") format, with each and every single "token" in the original data represented by a single row in the table. Even multiple-stops—tokens separated by spaces—are broken onto their own rows. Meanwhile, each column in the humdrum table represents a single piece of information associated with each token, which we call a field. Throughout this documentation, you should keep in mind that a "token" refers to a row in the humdrum table while a "field" refers to a column:
Token = row
Field = column
There are six types of fields in a humdrum table:
Data fields
Structure fields
Interpretation fields
Formal fields
Reference fields
Grouping fields
When first created by a call to readHumdrum()
, every
humdrum table has at least nineteen fields: one data field (Token
), two interpretation
fields (Tandem
and Exclusive
), three formal fields, and thirteen structure fields. Additional
formal, interpretation, or reference fields
may be present depending on the content of the humdrum file(s), and you can create additional data fields
by using within.humdrumR(), mutate.humdrumR()
, or other functions.
Data fields are used to describe individual data points
in humdrum data (as opposed to groups of points).
Every humdrum table starts with a data
field called Token, which
contains character
strings representing the original strings read from the humdrum files.
Users can create as many additional data fields as they like. Every call to
withinHumdrum()
generates new data fields.
Every humdrum table has thirteen Structure fields, which describe where each data token was "located" in the original humdrum data: which file, which spine, which record, etc. See the vignette on humdrum syntax to fully understand the terms here.
File info:
Filename
:: character
The unique name of the humdrum file. This may include an appended path
if more than one file with the same name were read from different directories
(see the readHumdrum()
docs).
Filepath
:: character
The full file name (always includes its full path).
Label
:: character
A label specified during the call to readHumdrum()
, associated with a particular
readHumdrum
"REpath-pattern." If no label was specified, patterns are just labeled "_n"
, where "n
" is the
number of the pattern.
File
:: integer
A unique number associated with each file (ordered alphabetically, starting from 1
).
Piece
:: integer
A number specifying the number of the piece in the corpus.
This is identical to the File
field except when
more than one piece were read from the same file.
Location info:
Spine
:: integer
The spine, numbered (from left-to-right) starting from 1
.
This field is NA
wherever Global == TRUE
.
Path
:: integer
The "spine path." Any time a *^
spine path split occurs in
the humdrum data, the right side of the split becomes a new "path." The original path
is numbered 0
with additional paths numbered with integers to the right.
(If there are no spine path splits, the Path
field is all 0
s.)
This field is always NA
when Global == TRUE
.
ParentPath
:: integer
For spine paths (i.e., where Path > 0
), which path was the parent from
which this path split? Where Path == 0
, parent path is also 0
.
Record
:: integer
The record (i.e., line) number in the original file.
DataRecord
:: integer
The data record enumeration in the file, starting from 1
.
Stop
:: integer
Which token in a multi-stop token, numbered starting from 1
.
In files with no multi-stops, the Stop
field is all 1
s.
This field is always NA
when Global == TRUE
.
Global
:: logical
Did the token come from a global record (as opposed to a local record)?
When Global == TRUE
, the Spine
, Path
, and Stop
fields are always NA
.
Token info:
Type
:: character
What type of record is it?
"G"
= global comment.
"L"
= local comment
"I"
= interpretation
"M"
= measure/barline
"D"
= non-null data
"d"
= null data
"E"
= exclusive interpretation
"S"
= spine-control tokens (*^
, *v
, *-
)
Interpretation fields describe interpretation metadata in the humdrum file(s).
Humdrum interpretations are tokens that "carry forward" to data points after them, unless cancelled out by a
subsequent interpretation. (See the humdrum syntax vignette for a detailed explanation.)
All humdrum data must have an exclusive interpretation
so humdrum tables always have an Exclusive
(:: character
) field indicating the
exclusive interpretation associated with each token/row of the Token
field.
Humdrum data may, or may not, include additional tandem interpretations. A universal rule for parsing
tandem interpretations is impossible, because A) tandem interpretations can "overwrite" each other and B)
users can create their own tandem interpretations. The best we can do in all cases is
identify all tandem interpretations that have appeared previously in the spine
(counting most recent first). All these previous interpretations are encoded in a single
character string in the Tandem
field (see the tandem()
docs for details).
If working with non-standard interpretations, users can parse the Tandem
field using the
tandem()
function.
If no tandem interpretations occur in a file, the Tandem
field is full of empty strings (""
).
Fortunately, many tandem interpretations are widely used and standardized, and these
interpretations are known by humdrumR
. Recognized interpretations (such as *clefG4
and *k[b-]
)
are automatically parsed into their own fields by a call to readHumdrum()
.
See the readHumdrum()
documentation for more details.
Formal fields indicate musical sections, or time windows within
a piece, including formal designations ("verse", "chorus", etc.) and measures/bars.
Humdrum data may or may not include formal metadata fields, indicated by the token "*>"
.
Classified formal marks are put into fields matching their name.
Unclassified formal marks are placed in a field called Formal
as a default.
Nested formal categories are appended with an underscore and a number for each level of descent:
Formal_1, Formal_2, ..., Formal_N
.
If part of a section is not given a name in a lower hierarchical level, the field is simply
empty (""
) at that point.
Humdrum data may, or may not, also include barlines (tokens beginning "="
).
However, humdrum tables always include three formal fields related to barlines:
Bar
:: integer
How many barline records (single or double) have passed before this token?
If no "="
tokens occur in a file, Bar
is all zeros.
Note that this field is independent of whether the barlines are labeled with numbers in the humdrum file!
DoubleBar
:: integer
How many double-barline records have passed before this token?
If no "=="
tokens occur in a file, DoubleBar
is all zeros.
BarLabel
:: character
Any characters that occur in a barline-token after an initial "="
or "=="
.
These include the "-"
in the common "implied barline" token "=-"
,
repeat tokens (like "=:||"
), and also any explicit bar numbers.
Note that the Bar
field always enumerate every bar record, while
measure-number labels in humdrum data (which appear in the BarLabel
field) may
do weird things like skipping numbers, repeating numbers, or having suffixes (e.g., "19a"
).
If no barline tokens appear in the file, BarLabel
is all empty strings (""
).
If no barline tokens are present in a file, Bar
and DoubleBar
will be nothing but 0
s,
and BarLabel
will be all NA
.
Reference fields describe any Reference Records
in the humdrum data. Every reference record (records beginning "!!!"
) in any
humdrum file in a corpus read by readHumdrum is parsed into a field named
by the reference code: "XXX"
in "!!!XXX"
.
Reference tokens are all identical throughout
any humdrum piece. If a reference code appears in one file but not another, the field is
NA
in the file which does not have the code. If no reference records appear in any
files read by readHumdrum()
, no reference fields are created.
Examples of common reference records are "!!!COM:"
(composer) and "!!!OTL:"
(original title).
Any humdrum data with these records will end up having COM
and OTL
fields in its humdrum table.
Grouping fields are special fields which may be created by calls to group_by(). These fields are deleted by calls to ungroup(). These fields are generally hidden/inaccessible to users.
In humdrum syntax, there is no requirement that every spine-path contains data in every record. Rather, spines are often padded with null tokens. In some cases, entire records may be padded with null tokens. Each type of humdrum record uses a different null token:
Intepretation: *
Comment: !
Barline: =
Data: .
Many humdrumR
functions automatically ignore null data, unless you specifically tell them not to
(usually, using dataTypes
argument).
Whenever different fields()
are created or selected, humdrumR
reevaluates
what data locations it considers null.
Note that humdrumR
considers data locations to be "null" when
the selected fields are all character
data and the token is a one of c(".", "!", "!!", "=", "*", "**")
; or
the selected fields are all NA
(including NA_character_
).
When humdrumR
reevaluates null data, the Type
field is updated, setting data records to Type == "d"
for null data and Type == "D"
for non-null data.
This is the main mechanism humdrumR
functions use to ignore null data: most functions
only look at data where Type == "D"
.
Whenever you print or export a [humdrumR objecthumdrumRclass, null data in the selected fields
prints as "."
—thus NA
values print as .
.
Thus, if you are working with numeric data with NA
values, these NA
values will print as "."
.
Breaking the complex syntax of humdrum data into the "flat" structure of a humdrum table, with every single token on one line
of a data.table
, makes humdrum data easier to analyze.
Of course, thanks to the structure fields, we can easily
regroup and reform the original humdrum data or use the structure of the data (like spines) in our analyses.
However, in some cases, you might want to work with humdrum data in a different structure or "shape."
humdrumR
has several options for "collapsing" tokens within humdrum tables,
"cleaving" different parts of the data into new fields,
or otherwise reshaping humdrum data into basic R data structures you might prefer.
The fields()
function takes a humdrumR object
and returns a data.table()
, with each
row describing an available field in the humdrum table.
The output table has five columns:
Name
The field name.
Class
The class()
of the data in the field.
Type
The type of field (described above).
Can be "Data"
, "Structure"
, "Interpretation"
, "Formal"
, "Reference"
, or "Grouping"
.
Selected
,
A logical
indicating which fields are selected.
GroupedBy
A logical
indicating which, if any, fields are currently grouping the data.
Using the names()
function on a humdrumR object will
get just the field names, the same as fields(humData)$Name
.
To actually extract fields from humdrumR data, see the pull()
family of functions.
humData <- readHumdrum(humdrumRroot, "HumdrumData/BachChorales/chor00[1-4].krn")
fields(humData)
getHumtab(humData)
getHumtab(humData, dataTypes = 'D')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.