Dog, doggy, dogs: Characterizing variability

within and across families during infancy

Moore & Bergelson (2018)

Presented at ICIS 2018 in Philadelphia, PA


Speech directed to infants is rife with linguistic variability. Usually, that variability is systematic, providing cues to changes in the environment, as in plural marking (e.g. dogs = multiple instances of 'dog'). However, infants also hear variability that does not correspond to meaningful or observable differences, as in diminutivization, e.g. 'doggy' for dog (which in child-directed speech connotes familiarity rather than smallness; Savichkiene & Dressler, 2007). In such instances, both wordforms are interchangeable; we dub this wordplay. This type of variability creates a unique problem for infants: they must determine which word variants are functionally equivalent, and which differ meaningfully. To further complicate learning accounts, the amount of wordplay children hear may vary from family to family. To characterise the scope of wordplay variability, we used data from the SEEDLingS corpus, a longitudinal collection of daylong audio- and hour-long videorecordings capturing the nouns directed to or said by 44 infants from 6-17 months. For each concrete object word spoken during these recordings, RAs annotated who said it, how it was said (e.g. tooferoo) and its dictionary word form (e.g. the lemma tooth). From these annotations, we characterized the differences between dictionary forms and spoken forms, classifying words into three categories: wordplay (doggy/dog), morpheme-adding (i.e. words that only underwent linguistically meaningful changes like pluralization or noun-noun modification, e.g. coat/raincoat), and frozen words (i.e. words that only occurred in a single form in our >304,000 word corpus, e.g. dishwasher). We found that the average infant heard ~25 instances of wordplay per day, but that only some words licensed wordplay. These tended to be high-frequency words (e.g. tooth, baby): the correlation between the number of forms a word occurred in and overall frequency was 0.82 (p<.05). Wordplay nouns also occurred at high rates on the MCDI (33%), relative to morpheme-adding (4%) and frozen words (0.4%). Across families, we found that while all families used wordplay, the rate at which they did so was independent of that family's overall talkativeness. In other words, wordplay did not scale in families where infants heard 181 more nouns. Instead, it appears to be an independent characteristic, idiosyncratic to families (see Figure 1). We next linked wordplay to infants' own productions. While only 0.5-12.9% of nouns in the input underwent wordplay, the variability across families correlated with infants' volubility. That is, collapsing across age, children who heard the least wordplay produced more noun tokens overall (Spearman's rho= -0.35, p = 0.02). However, given that wordplay was heavily weighted towards high-frequency words that tended to be learned early overall (approximated using Wordbank norms; Frank et al., 2017), this did not appear to inhibit learning on a word-by-word basis. These results are consistent with the possibility that hearing words in more stable surface-level manifestations (i.e. with less wordplay) may create a clearer lexical target for infants' early productions. In short, we suggest that considering the surface-level appearance of words across and within families may provide a fruitful entry-point for understanding meaningful linguistic variability, and what infants learn from it.