Load VIHI annotation data
get_vihi_annotations.Rd
Clone BLAB-private vihi_annotations
repo to ~/BLAB_DATA
once before using this function.
Arguments
- version
version tag to checkout
- subset
Which pre-defined subset of the data should be loaded?
'random' (the default) loads the annotations from the 15 randomly sampled intervals from all recordings in the corpus.
'VI+TD-VI' loads the annotations from the random and the top-5 high-volubility intervals from VI recordings and their TD matches.
'everything' loads all annotations from all tiers. Exercise caution with this option: the data will include incomplete and unchecked annotations.
- table
Which table to return -
annotations
(the default) orintervals
. Ifmerged
, returns theannotations
table with the interval information merged in. Intervals without annotations won't be included. Ifall
, returns a named list of both tables.#'- include_all_tier_types
Should all tier types be included in the output? If
FALSE
(the default), only tiers that are relevant to the subset are returned. For the 'random' and 'VI+TD-VI' subsets, the relevant tier types are: transcription, vcm, lex, mwu, xds. For the 'everything' subset, this parameter is ignored as all tier types are returned.- allow_annotation_errors
In case errors are found in the annotations, should the function throw an error (
FALSE
, the default) or adderror_n
columns to theannotations
table? Use only as a way to inspect the errors, not as a way to ignore them.- include_pi
Should annotations marked as PI be included in the output? If
FALSE
(the default), they are filtered out.
Details
The speaker TIER is identified by the participant
column. Other tiers are
in columns.
Notes:
Annotation are checked for errors for the standard ACLEW tiers only. Interval-level checks aren't currently checked at all.
Annotations marked as PI are included. Filter them out if you don't want them.
The transcribed utterance can be empty (”). Normally, that means that a code interval has been segmented but not annotated. But there might be other stray utterance segments like that.
(relevant for non-speaker TIERs only) Currently, there is no way to tell whether an annotation is missing because it was not segmented or because it was segmented but not yet annotated: both are represented as NA. This will change in the future: missing segment will still be NA, but missing annotation will be ”.
Examples
vitd_annotations <- get_vihi_annotations(version='0.0.0.9006-dev.5',
subset='VI+TD-VI')
#> Error in run_git_command(repo, "fetch --tags --prune --prune-tags"): Expected to find the "vihi_annotations" repository at the following location: /home/runner/BLAB_DATA/vihi_annotations. Please clone it.
vitd <- get_vihi_annotations(version='0.0.0.9006-dev.5', subset='VI+TD-VI',
table='all')
#> Error in run_git_command(repo, "fetch --tags --prune --prune-tags"): Expected to find the "vihi_annotations" repository at the following location: /home/runner/BLAB_DATA/vihi_annotations. Please clone it.
vitd$annotations %>% head()
#> Error: object 'vitd' not found
vitd$intervals %>% head()
#> Error: object 'vitd' not found