Changelog
blabr 0.25.2
blabr 0.25.0
Changed
-
assign_time_windows:- The window boundaries are now specified with two arguments:
t_starts,t_endsinstead oft_start,short_window_time,med_window_time, andlong_window_time.DEFAULT_WINDOWS_UPPER_BOUNDScan be used ast_ends. - The windows columns are now called “window_
ms” and “which_window _ ms” instead of, e.g., “shortwin” and “whichwin_short”. - The window boundaries have to be specifed explicitly, no default values will be used.
- The
which_window_<t_start>_<t_end>mscolumns now have the following values: “pre”, “neither”, and “_ ms” (the latter used to be “short”/“med”/“long”)
- The window boundaries are now specified with two arguments:
-
tag_low_data_trials:-
min_fractionmust be specified explicitly, the default value of 1/3 was removed but will be suggested in the error message. - You can now either specify the
window_columnto be used ort_startandt_endremoving the confusing option where an existing window column was used but its end andt_startwere used in the calculation of the maximum number of bins in the window. - When you specify a window column, the boundaries will be inferred from the column name (see
assign_time_windows).
-
blabr 0.24.0
Changed
- Streamlined update_global_basic_levels. It now loads all_basiclevel_NA.csv from a local file, not from BLAB_DATA. This way, there is no need to create a tagged commit in the all_basiclevel repo with the updated all_basiclevel_NA.csv that doesn’t yet have global basic levels. You still need to commit and tag global basic level mapping dictionaries in the same repo though.
blabr 0.23.0
Changed
- Update
get_seedlings_nounsand friends to work with the v2.0.0-dev.3 version of the dataset:- Update column names and types.
- Teach to treat v2.0.0-dev.3 as a development version.
-
get_*functions that read data from~/BLAB_DATAwill no longer make the corresponding repos headless when the version commit is already checked out as a branch.
blabr 0.22.0
Added
-
read_message_reportfunction. -
split_fixation_repor,split_message_report, andmerge_split_reportsfunctions that together transform the data from a fixation and a message report into a list of these tables:experiment,recordings,trials,fixations,messages. Use them as the first step in eyetracking processing.
Changed
- Most of the eyetracking functions have been renamed and updated. One notable difference is that they expect input dataframe to contain columns
recording_idandtrial_indexand do not allow to choose a column as a parameter. Instead, usedplyr::renameif necessary or, better yet, keep these two column names and don’t rename them. Functionssplit_fixation_reportandsplit_message_reportwill create these columns for you by renamingRECORDING_SESSION_LABELandTRIAL_INDEXfrom the original report tables. -
binifyFixationsis nowfixations_to_timeseries. What’s chcanged:- It is now a lot faster. As an example, the processing time went from 1+ min to 1-3 s on the “ht_seedlings” dataset.
-
keepColsparameter was dropped, usedplyr::selectinstead. -
gaze,binSize, andmaxTimeparameters are nowfixations,t_step, andt_max. Otherwise, they didn’t change. - It expects the fixation boundaries to be in the
t_startandt_endcolumns. If you usedsplit_fixation_report, they will be named like that already, otherwise usedplyr::rename. - The function will throw an error if the fixations overlap which has happened to some datasets in the past.
- The output no longer has the
timeBincolumn (it was redundant and easily confusable withtime),Nonsetist_onset- time since the target onset rounded up to the nearest multiple oft_step.
-
fixations_reportis nowread_fixation_report, parameterval_guess_maxis justguess_maxnow. -
get_windowsis nowassign_time_windows.- It now takes the following parameters:
fixation_timeseries,t_step(previouslybin_size,t_start,short_window_time,med_window_time,long_window_time. - The input dataframe must contain the
target_onsetcolumn. -
nb_1that used to take the number of time steps (bins) since the target onset until the common window start was replaced witht_startwhich takes the time in ms between those two events. - There will be more changes to this function in the future mainly aimed at making it harder to use one set of window boundaries in this function and then a different one in
tag_low_data_trials(previouslyFindLowData).
- It now takes the following parameters:
-
FindLowDatais nowtag_low_data_trials.- It now takes the following parameters:
fixation_timeseries,window_column,t_start,t_end,t_step,min_fraction(defaults to 1/3). -
min_fractionmakes the minimum amount of data more explicit. - The
nb_2parameter was replaced witht_startwhich has the same meaning as inassign_time_windows. If you are converting from older code,nb_2used to be in ms, unlikenb_1inget_windowswhich was in bins so there is nothing to convert. - There is no check that
t_startandt_endcorrepond to those used inassign_time_windowsso be careful to make sure they match. - The output column is now called
is_low_data_trialand is always TRUE or FALSE (it used to be NA for some trials).
- It now takes the following parameters:
Fixed
- Example for
get_vihi_annotationsuses data version that doesn’t lead to errors. -
whichwinmedandwhichwinshortcolumns in the output ofassign_time_windows(previously,get_windows) are now calculated correctly. They used to be identical towhichwinlongbecause of a bug.
Removed
- Many eyetracking functions were hard-deprecated in favor of their renamed and updated versions. Some of them are described above under “Changed”. See
help(blabr::defunct)for the full list. You can also run any of them to get a replacement suggestion. - Removed
RemoveLowDataandRemoveFrozenTrials. Usetag_low_data_trialsandFindFrozenTrialsfollowed bydplyr::filter()instead.
blabr 0.21.0
Changed
-
get_vihi_annotationsnow- filters out PI by default,
- returns only randomly sample intervals by default,
- can return annotations from random and top-5 high-volubitlity intevals for VI and their TD matches with
subset = 'VI+TD-VI', - checks ACLEW tiers for consistency,
- see
?get_vihi_annotationsfor more details.
blabr 0.20.0
Added
- Introduced
get_seedlings_nouns_extraandget_seedlings_nouns_codebookfunctions.get_seedlings_nounsonly loads the main table now. -
get_seedlings_nounsand friends now produce messages informing user about the existence of codebooks, relation to other tables, etc. - Dataset versions removed from GitHub but persisting locally will no longer be loaded.
- Add
get_blab_share_path, removeget_pn_opus_path.
Fixed
- The help page of
get_seedlings_nounshas been updated. - Check that the requested dataset versions are present in the GitHub repo. Supplying non-existent versions to
get_*functions (get_seedlings_nouns,get_vihi_annotaitons, etc.) used to lead to loading of the version that was currently inBLAB_DATA. - Enforce column specification in
get_*functions: throw an error if it doesn’t match the data. Previously, if I messed up and didn’t add new columns to the code at all or didn’t use them for specific dataset versions, there was no indication of that. - Fix column misspecification of seedlings-nouns tables revealed by enforcing column specifications.
- Add missing column specs to get_vihi_annotations.
- Avoid repeating dataset version handling unnecessariliy resulting in doubling of reminder to supply a version to the function call or update the requested version.
- Throw an error if a git command throws one.
blabr 0.18.0
Fix: - No more installation errors due to missing the tidyverse meta-package. - Now, neither the packages from tidyverse, nor any other packages are attached after running library(blabr). This forces the user to insert explicit library(<pkg>) calls to their own code leading to fewer unintended consequences, e.g., filter referring to dplyr::filter instead of the standard stats::filter even when user didn’t run library(dplyr).
blabr 0.17.0
Features: - Switched to the public version in get_seedlings_nouns. The development versions can still be requested. - Now, get_seedlings_nouns can get other tables and codebooks from the SEEDLingS - Nouns dataset with the table and get_codeobook parameters.
Fixes: - CONTRIBUTING.md - devtools::test() should be run before devtools::check(). - Multiple tests don’t fail anymore. Except for test-seedlings.R, this one is skipped for now.
blabr 0.16.3
Fixes: - Take into account that global_bl already exists when adding an updated global_bl column to “all_basiclevel_NA.csv”. - col_factor was called without qualifying with readr::.
blabr 0.16.2
Fix: correctly add global_bl column specification when reading “all_basiclevel_NA.csv”.
blabr 0.16.0
Account for csvs in the seedlings-nouns_private having moved to the “public/” subfolder.
blabr 0.15.0
get_all_basiclevel uses “all_basiclevel_na.csv” only as it will be the only file in the all_basiclevel repo from now on. If you need to stick to an older version of blabr and get_all_basiclevel stopped working for you, add type = 'csv' to the call.
blabr 0.14.0
Add function get_seedlings_nouns that loads the seedlings-nouns dataset from the lab-private repo.
blabr 0.13.1
Bugfix: update big_aggregate to reflect the switch from “TVS” to “TVN” as “speaker” value.
blabr 0.13.0
-
Improved handling of segments spanning two intervals in
add_lena_statsandmake_five_min_approximation: each such segment contributes utterance counts to these intervals in proportion to the overlap.With this change,
make_five_min_approximationproducesawcandcvcon the test file that differ from the corresponding lena5min csv file by at most
blabr 0.12.0
-
New functions:
prepare_intervalsandadd_lena_statsthat previously used to be a single functionmake_five_min_approximation. The latter still exists but calls the former two now.There are some changes to the behavior of
make_five_min_approximation:- No more zero-duration intervals.
- Segments overlapping with two intervals now count fully towards both (previously they would count only towards the first one).
- Intervals returned for any time point the recording was on (previously, only intervals with segments starting in them were returned).
blabr 0.11.0
- Multiple fixes to the global basic level logic:
- ambiguous words in
object_dictcan’t have rows withNAindisambiguate, - more specific instructions for manual updates,
- copy
object_dictwhen some annotations need disambiguating even if the dictionary itself does not need to be updated.
- ambiguous words in
blabr 0.9.0
- Only objects marked for export in the roxygen comments will be exported. If a function from
blabrno longer works, useblabr:::<function_name>and tell the lab technician.
blabr 0.6.0
- Added
make_new_global_basic_levelfunction that loadsall_basiclevel, and adds aglobal_blcolumn to it which contains global basic levels. Cloneglobal_basic_levelto~/BLAB_DATAbefore using.
blabr 0.5.2
- bugfix:
get_seedlings_speaker_statsnow uses thespeakerfield from the sparse code csvs, instead of the LENA-identifiedtierfield.
blabr 0.5.1
- bugfix:
get_seedlings_speaker_statscalled multiple functions without specifying thelibrary::part.
blabr 0.5.0
- Added
read_rttm/write_rttmfunctions to read/write.rttmfiles that Voice Type Classifier (VTC) creates. - Added functions
get_seedlings_speaker_stats,get_vtc_speaker_statsthat add stats to a set of time intervals based on Seedlings annotations and VTC outputs respectively. - Renamed
get_speaker_statstoget_lena_speaker_statsto make the stats source explicit.
blabr 0.4.6
LENA functions:
-
calculate_lena_like_statsnow outputs an additional columninterval_start_wavthat contains the interval start as the number of milliseconds from the start of the wav file, -
sample_intervals_*functions keep only the necessary columns from the inputintervals_tibble:interval_start,interval_end, and - in the case ofsample_intervals_with_highest- the column whose values was maximized.
blabr 0.4.5
- Removed dependency on
fuzzyjoinand BioConductor packageIRanges- less problems installing blabr. - VIHI LENA intervals for annotation: prevent utterances counting towards two neighbouring intervals.
blabr 0.4.2
-
LENA: calculate stats, sample intervals in several ways
- get LENA-like AWC, CTC, CVC stats for given time intervals,
- get speaker-level stats: adult word count, total segment duration, child utterance count
- sample intervals: randomly, periodically, and optimizing for a given metric
blabr 0.4.1
- bugfix:
lagis now prefixed withdplyr::inmake_five_min_approximation, so thatstats::lagis not used.
blabr 0.4.0
- add
make_five_min_approximationfunction that processes an .its file and outputs a tibble with columnsduration,AWC.Actual,CTC.Actual,CWC.Actualthat are similar to the ones in the LENA’s 5min.csv files, except for a different handling of speech segments that cross a 5-min interval border: LENA splits the values between the two intervals, while we consider them to belong to the first one.
blabr 0.3.0
-
get_*functions produce similar results for “.csv” and “.feather” now. The attributes are not exactly the same and the orders of factor levels are different but now the outputs are both tibbles and have the same column types. The similarity is checked withall.equal(..., check.attributes = FALSE))
blabr 0.2.0
-
get_*functions do not havebranchandcommitparameters anymore, instead they have a newversionparameter that currently refers to a tag label in the corresponding dataset repository. Usingget_all_*functions without supplying the version argument is discouraged, an appropriate warning is in place.Motivation for the change:
- explicitly setting dataset version gives one a chance at reproducible analysis,
- using versions instead of commit hashes lets us later choose a different non-git storage option. Or, even if we do go with git, the old and new hashes will not clash and/or confuse the users.
Added a
NEWS.mdfile to track changes to the package.