2016
27 gennaio
Seminario interdisciplinare
ore 16:00
presso Seminario I
In written language, the choice of specific words is constrained by both the particular semantic context consistent with the message to be transmitted, and grammatical requirements. To a significant degree, the semantic context is also affected by a larger cultural and historical environment, which in turn also influences matters of style and fashion. Over time, those environmental influences leave an imprint in the statistics of language use, leading to some words becoming more common while others are used less frequently. I will present a data-driven study of the statistics of language use over time based on the analysis of word frequencies extracted from more than 4.5 million books written over a period of 300 years (Google Ngram database). I will show evidence of systematic oscillatory patterns in word use that are highly consistent across different words. Moreover, while the periods of the oscillations are independent of the particular word, complex network analysis reveals that semantically related words show strong phase coherence. Ultimately, the origin of these previously unknown patterns in the statistics of language may be a consequence of the underlying broader cultural dynamics.
Torna alla pagina dei seminari del Dipartimento di Matematica di Bologna