Seminario interdisciplinare
ore
16:00
presso Seminario I
In written language, the choice of specific words is constrained by both the
particular semantic context consistent with the message to be transmitted, and
grammatical requirements. To a significant degree, the semantic context is
also affected by a larger cultural and historical environment, which in turn
also influences matters of style and fashion. Over time, those environmental
influences leave an imprint in the statistics of language use, leading to some
words becoming more common while others are used less frequently. I will
present a data-driven study of the statistics of language use over time based
on the analysis of word frequencies extracted from more than 4.5 million
books written over a period of 300 years (Google Ngram database). I will
show evidence of systematic oscillatory patterns in word use that are highly
consistent across different words. Moreover, while the periods of the
oscillations are independent of the particular word, complex network analysis
reveals that semantically related words show strong phase coherence.
Ultimately, the origin of these previously unknown patterns in the statistics
of language may be a consequence of the underlying broader cultural dynamics.