Throwback (2014): Can toddlers learn language from participating in video chats?

Screen time, visual media, tablets, computers, video games, Skype, FaceTime…

Young kids are exposed to a variety of technology whether directly or indirectly on a daily basis. The American Academy of Pediatrics has provided recommendations for the amount of exposure and screen time for young children based on their age. But, these suggestions don’t answer the question of whether toddlers and preschoolers are actually getting anything out of the language they’re being exposed to through screen media.

Previous research (e.g., Kuhl, Tsao & Liu, 2003) has demonstrated what is referred to as a “video deficit” (Anderson & Pemek, 2005), which suggests that mode of input (live person versus a comparable media source) matters when it comes to language acquisition. While this is good information, the real question is—what’s happening during the live interaction compared to the video-based interaction that makes the difference for language learning? And, does that mean that kids can never benefit from linguistic input provided in a video?

Evidence outside of the field of speech–language pathology has shown that socially contingent interactions (Troseth, Saylor & Archer, 2006; Zimmerman et al., 2007) make all the difference when it comes to early language development. Socially contingent conversational partners provide immediate, reliable, and accurate responses, use the child’s name, make eye contact, ask questions, and take conversational turns (e.g., Csibra, 2010). 

Given this information paired with the rising popularity of video chat platforms like Skype, FaceTime, and Google Chat, the authors of this study wanted to determine the role of social contingency in word learning by comparing live interaction, video chat, and a prerecorded video chat. Thirty-six two-year-olds were randomly assigned to one of these three training conditions, and were taught one of four novel verbs:                                                                                         

blicking (= bouncing)

twilling (= swinging)

frapping (= shaking)

meeping (= turning)

In addition to measuring the children’s ability to learn one of these verbs, the researchers also collected eye-tracking data to determine whether eye contact had a relationship to word learning.


Results from the study suggested that the toddlers were only able to learn novel words from video chats and live interaction—the two conditions that included socially contingent interactions—but, not from the prerecorded video formats. And, the children who learned the verbs in these conditions were able to generalize their understanding to different contexts. Eye contact also played a role in word learning: the children who attended to the experimenter’s eyes learned the novel words better than those that had less consistent eye contact.

Put together, these results suggest that social contingency is, in fact, the key ingredient when we’re thinking about teaching children under the age of three new words. Regardless of whether we’re interacting with the child in person or via video chat (think, telepractice!), including all of the elements of social contingency—eye contact, immediate responses, using the child’s name, asking questions, and taking and encouraging conversational turns—can make the difference in terms of the child actually benefiting linguistically.*

(*In fact, previous research (e.g., O’Doherty et al., 2011) has shown that toddlers learn better from watching a social interaction between two characters versus watching shows that follow a similar format to the prerecorded video chat condition. So, shows like Blue Clues that attempt to interact with the audience indirectly by posing questions, pausing, and then providing a response, may not be as beneficial as one might think.)


Roseberry, S., Hirsh-Pasek, K., & Golinkoff, R. (2014). Skype me! Socially Contingent Interactions Help Toddlers Learn Language. Child Development, 85(3), 956–970.