verify_no_overlapping_fixations
which is internally used by fixations_to_timeseries
.read_message_report
function.split_fixation_repor
, split_message_report
, and merge_split_reports
functions
that together transform the data from a fixation and a message report into a
list of these tables: experiment
, recordings
, trials
,
fixations
, messages
. Use them as the first step in eyetracking processing.recording_id
and trial_index
and do not allow to choose a column as a parameter.
Instead, use dplyr::rename
if necessary or, better yet, keep these two column names and don't rename them.
Functions split_fixation_report
and split_message_report
will create these columns for you by renaming RECORDING_SESSION_LABEL
and TRIAL_INDEX
from the original report tables.binifyFixations
is now fixations_to_timeseries
. What's chcanged:keepCols
parameter was dropped, use dplyr::select
instead.gaze
, binSize
, and maxTime
parameters are now fixations
, t_step
, and t_max
.
Otherwise, they didn't change.t_start
and t_end
columns.
If you used split_fixation_report
, they will be named like that already, otherwise use dplyr::rename
.timeBin
column (it was redundant and easily confusable with time
), Nonset
is t_onset
- time since the target onset rounded up to the nearest multiple of t_step
.fixations_report
is now read_fixation_report
, parameter val_guess_max
is just guess_max
now.get_windows
is now assign_time_windows
. fixation_timeseries
, t_step
(previously bin_size
, t_start
, short_window_time
, med_window_time
, long_window_time
.target_onset
column.nb_1
that used to take the number of time steps (bins) since the target onset until the common window start was replaced with t_start
which takes the time in ms between those two events.tag_low_data_trials
(previously FindLowData
).FindLowData
is now tag_low_data_trials
.fixation_timeseries
, window_column
, t_start
, t_end
, t_step
, min_fraction
(defaults to 1/3).min_fraction
makes the minimum amount of data more explicit.nb_2
parameter was replaced with t_start
which has the same meaning as in assign_time_windows
.
If you are converting from older code, nb_2
used to be in ms, unlike nb_1
in get_windows
which was in bins so there is nothing to convert.t_start
and t_end
correpond to those used in assign_time_windows
so be careful to make sure they match.is_low_data_trial
and is always TRUE or FALSE (it used to be NA for some trials).get_vihi_annotations
uses data version that doesn't lead to errors.whichwinmed
and whichwinshort
columns in the output of assign_time_windows
(previously, get_windows
) are now calculated correctly.
They used to be identical to whichwinlong
because of a bug.help(blabr::defunct)
for the full list.
You can also run any of them to get a replacement suggestion.RemoveLowData
and RemoveFrozenTrials
. Use tag_low_data_trials
and FindFrozenTrials
followed by dplyr::filter()
instead.get_vihi_annotations
nowsubset = 'VI+TD-VI'
,?get_vihi_annotations
for more details.transcription
and transcription_id
.get_seedlings_nouns_extra
and get_seedlings_nouns_codebook
functions.
get_seedlings_nouns
only loads the main table now.get_seedlings_nouns
and friends now produce messages informing user about the existence of codebooks, relation to other tables, etc.get_blab_share_path
, remove get_pn_opus_path
.get_seedlings_nouns
has been updated.get_*
functions (get_seedlings_nouns
, get_vihi_annotaitons
, etc.) used to lead to loading of the version that was currently in BLAB_DATA
.get_*
functions: throw an error if it doesn't match the data.
Previously, if I messed up and didn't add new columns to the code at all or didn't use them for specific dataset versions, there was no indication of that.get_vihi_annotations
. I don't yet know why
this wasn't flagged by devtools::check()
or didn't cause a test error.get_vihi_annotations
similar to other get_*
functions.Fix:
- No more installation errors due to missing the tidyverse
meta-package.
- Now, neither the packages from tidyverse
, nor any other packages are attached after running library(blabr)
.
This forces the user to insert explicit library(<pkg>)
calls to their own code leading to fewer unintended consequences, e.g., filter
referring to dplyr::filter
instead of the standard stats::filter
even when user didn't run library(dplyr)
.
Features:
- Switched to the public version in get_seedlings_nouns
.
The development versions can still be requested.
- Now, get_seedlings_nouns
can get other tables and codebooks from the SEEDLingS - Nouns dataset with the table
and get_codeobook
parameters.
Fixes:
- CONTRIBUTING.md
- devtools::test()
should be run before devtools::check()
.
- Multiple tests don't fail anymore.
Except for test-seedlings.R
, this one is skipped for now.
Fixes:
- Take into account that global_bl
already exists when adding an updated global_bl column to "all_basiclevel_NA.csv".
- col_factor
was called without qualifying with readr::
.
Fix: correctly add global_bl
column specification when reading "all_basiclevel_NA.csv".
Account for adding global_bl
column directly to all_basiclevel
.
Account for csvs in the seedlings-nouns_private
having moved to the "public/" subfolder.
get_all_basiclevel
uses "all_basiclevel_na.csv" only as it will be the only file in the all_basiclevel
repo from now on.
If you need to stick to an older version of blabr and get_all_basiclevel
stopped working for you, add type = 'csv'
to the call.
Account for global basic level dictionaries having moved to the all_basiclevel repo.
Fix: make get_*
functions work on Windows.
Add function get_seedlings_nouns
that loads the seedlings-nouns
dataset from the lab-private repo.
Switch to Makrdown for docs.
Bugfix: update big_aggregate
to reflect the switch from "TVS" to "TVN" as "speaker" value.
add_lena_stats
and
make_five_min_approximation
: each such segment contributes utterance counts
to these intervals in proportion to the overlap.With this change, make_five_min_approximation
produces awc
and cvc
on
the test file that differ from the corresponding lena5min csv file by at most
1.
prepare_intervals
and add_lena_stats
that previously used
to be a single function make_five_min_approximation
. The latter still exists
but calls the former two now.There are some changes to the behavior of make_five_min_approximation
:
- No more zero-duration intervals.
- Segments overlapping with two intervals now count fully towards both
(previously they would count only towards the first one).
- Intervals returned for any time point the recording was on (previously,
only intervals with segments starting in them were returned).
object_dict
can't have rows with NA
in
disambiguate
,object_dict
when some annotations need disambiguating even if the
dictionary itself does not need to be updated.get_pn_opus_path
function.blabr
no longer works, use blabr:::<function_name>
and tell
the lab technician.get_vtc_speaker_stats
.add_vtc_stats
function that calculates VTC-bases ctc. Ported from
childproject.add_lena_stats
function that calculates ctc, cvc, and awc for a set of
intervals.make_new_global_basic_level
function that loads all_basiclevel
,
and adds a global_bl
column to it which contains global basic levels.
Clone global_basic_level
to ~/BLAB_DATA
before using.get_seedlings_speaker_stats
now uses the speaker
field from the
sparse code csvs, instead of the LENA-identified tier
field.get_seedlings_speaker_stats
called multiple functions without
specifying the library::
part.read_rttm
/write_rttm
functions to read/write .rttm
files that
Voice Type Classifier (VTC) creates.get_seedlings_speaker_stats
, get_vtc_speaker_stats
that
add stats to a set of time intervals based on Seedlings annotations and VTC
outputs respectively.get_speaker_stats
to get_lena_speaker_stats
to make the stats
source explicit.LENA functions:
calculate_lena_like_stats
now outputs an additional column
interval_start_wav
that contains the interval start as the number of
milliseconds from the start of the wav file,sample_intervals_*
functions keep only the necessary columns from the input
intervals_tibble
: interval_start
, interval_end
, and - in the case of
sample_intervals_with_highest
- the column whose values was maximized.fuzzyjoin
and BioConductor package IRanges
- less
problems installing blabr.LENA: calculate stats, sample intervals in several ways
get LENA-like AWC, CTC, CVC stats for given time intervals,
lag
is now prefixed with dplyr::
in make_five_min_approximation
,
so that stats::lag
is not used.make_five_min_approximation
function that processes an .its file and outputs a
tibble with columns duration
, AWC.Actual
, CTC.Actual
, CWC.Actual
that are
similar to the ones in the LENA's 5min.csv files, except for a different handling of
speech segments that cross a 5-min interval border: LENA splits the values between the
two intervals, while we consider them to belong to the first one.get_*
functions produce similar results for ".csv" and ".feather" now. The
attributes are not exactly the same and the orders of factor levels are
different but now the outputs are both tibbles and have the same column types.
The similarity is checked with all.equal(..., check.attributes = FALSE))
get_*
functions do not have branch
and commit
parameters anymore,
instead they have a new version
parameter that currently refers to a tag
label in the corresponding dataset repository. Using get_all_*
functions
without supplying the version argument is discouraged, an appropriate warning
is in place.Motivation for the change:
- explicitly setting dataset version gives one a chance at reproducible
analysis,
- using versions instead of commit hashes lets us later choose a different
non-git storage option. Or, even if we do go with git, the old and new
hashes will not clash and/or confuse the users.
* Added a NEWS.md
file to track changes to the package.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.