extract_samples: Extract part of a mediafiles based on a segment list

View source: R/reindeeR_segmentlist.R

extract_samplesR Documentation

Extract part of a mediafiles based on a segment list

Description

This function enables the user to extract parts of speech recordings based on a segment list (resulting from a call of query). The extracted parts will be named by the segment label, the first sample, last sample extracted, as well as the sample rate of the original media file. The user can also append metadata fields to the file file name following the segment label. The extracted signal file will be placed in sub-folders according session and / or bundle if the user specifies it. If ⁠create.session.subdir=FALSE,create.bundle.subdir=FALSE⁠, all extracted signal portions will be placed together in the output directory.

Usage

extract_samples(
  emuDBhandle,
  seglist,
  output.directory,
  include.labels = FALSE,
  include.metadata.fields = FALSE,
  create.session.subdir = TRUE,
  create.bundle.subdir = TRUE,
  encode.name = FALSE,
  field.separator = "_"
)

Arguments

emuDBhandle

The database handle.

seglist

A segment list (resulting from a call of query) which should be used as a cut list.

output.directory

The directory where all extracted parts of the signal, and sub-folders if required, will be placed.

include.labels

Boolean; Include the label of a segment in the output file name?

include.metadata.fields

Names of metadata fields whos values should be included in the file name. If a boolean (TRUE) all metadata fields will be encoded in the name of the output file.

create.session.subdir

Boolean; Should bundles belonging to different sessions be kept separate?

create.bundle.subdir

Boolean; Should signal files belonging to different bundles be kept separate?

encode.name

boolean; Should the bundle name be obfuscated in the output using md5 hashing?

field.separator

The field separator string to use when constructing the output file name.

Details

Optionally, the user may obfuscate the bundle name by MD5 hashing. This process makes the origin of the recording not possible to deduce from the name.

Examples

library(reindeer)
reindeer:::unlink_emuRDemoDir()
reindeer:::create_ae_db() -> emuDBhandle
query(emuDBhandle,"Phonetic = p") -> psl
output.directory <- file.path(tempdir(),"reindeeR_extract")
unlink(output.directory,recursive=TRUE)
extract_samples(emuDBhandle,psl,output.directory = output.directory,create.session.subdir=TRUE,create.bundle.subdir=TRUE)
print(list.files(path=output.directory))


humlab-speech/reindeer documentation built on May 21, 2023, 4:43 p.m.