LENA data and other languages

LENA was developed using data collected from American English speakers. It can be used with other spoken languages, with some considerations. More detail:

Language independent

  • The language used in the child’s environment does not affect segmentation, speaker identification, or identification of adult speech vs. adult non-speech (laugh, sneeze, etc.). Audio Environment is not affected by the language being spoken.
  • Our detection of child vocalizations is close to language-independent, as a vocalization is not a word. The younger the child, the less likely the vocalization detection is to be affected by use of a different language, but even for children on the older end of the LENA range we do not expect the language used to matter.
  • Conversational turns are based on the timing of detected child vocalizations and adult words, and so they are not affected by different languages.

Somewhat language dependent

  • The absolute values for adult word count in a non-English language may not be as accurate as for English due to differences in the phone set, syllables per word, and other factors, but the counts should be “off” by the same amount for a family from one recording to the next, allowing you to track change over time. You might not want to include non-English AWC data in an otherwise English dataset for research, but for the purpose of working with parents or doing case studies, it is still useful.

Language dependent - use with caution

  • The AVA score is not considered valid in other languages because AVA compares the phonemic complexity of the child’s output against an adult American English model, specifically. A 3-year old speaker of Mandarin is going to be using phonemes that are not part of the English set. That said, we have heard anecdotally from users that consistently low AVA scores in non-English speaking children do tend to flag other issues.