Seminario di fisica matematica
ore
15:30
presso - Aula Da Stabilire -
We study the performance of stochastic gradient descent
in high-dimensional inference tasks. Our focus is on the initial ``search'' phase
where the algorithm is far from a trust region and the loss landscape is highly non-convex.
We develop a classification of the difficulty of this problem, namely
whether the problem requires linear, quasilinear, or polynomially many samples in the dimension to achieve weak recovery of the parameter.
This classification depends on an intrinsic property of the population loss which we call the ``information exponent''. We illustrate our approach by applying it to a wide variety of estimation tasks
such as parameter estimation for generalize linear models, two component Gaussian mixture models, phase retrieval, and spiked matrix and tensor models, as well as supervised learning for singe-layer networks with general activation functions. In this latter case, our results translate to the difficulty of this task for teacher-student networks in terms of the Hermite decomposition of the activation function.