Language is a universal human ability, acquired readily by young children who otherwise struggle with many basics of survival1,2. And yet, language is variable across individuals. Behavioral and experimental observations suggest that children’s linguistic skills vary with factors like socioeconomic status3, children’s gender4, and multilingualism5. But which factors really influence children’s day-to-day language use? Here we leverage speech technology in a big-data approach to report on a unique cross-cultural and diverse data set: >2,500 day-long, child-centered audio-recordings of 1,001 2- to 48-month-olds from 12 countries spanning 6 continents across urban, farmer-forager, and subsistence-farming contexts. As expected, age and language-relevant clinical risks and diagnoses6 strongly correlated with how much speech (and speech-like vocalization) children produced. Critically, so too did adult talk in children’s environments: Children who heard less talk from adults produced less speech. In contrast to previous conclusions based on more limited sampling methods and a different set of language proxies, socioeconomic status, child gender, and multilingualism were not associated with children’s productions over the first four years of life. These findings from large-scale naturalistic data advance our understanding of what factors are robust predictors of variability in language behaviors of young learners in a wide range of everyday contexts.