Due to the rapidly increasing amount of available information, computer scientists and statisticians are facing new challenges to deal with big data problems. Some of the most popular and frequently applied approaches to solve this problem are distributed methods where the data is split and handled by multiple local servers or cores and computations are done locally, parallel to each other. Then the local machines transmit the outcome of their computations to a global server which aggregates the local results into a global one.
In the (Bayesian) literature various methods were proposed for distributed computational methods with seemingly good practical performance, but with limited theoretical underpinning. In our work we investigate the existing distributed methods in a standard nonparametric setting (the Gaussian signal-in-white-noise model) and compare their theoretical performance, i.e. posterior contraction rates and coverage of credible sets. Next we ask what is fundamentally possible in the distributed setting. To make this precise we add certain communication restrictions and prove minimax lower bounds for distributed procedures under such restrictions. Moreover, we exhibit distributed procedure attaining the bounds. Finally, we address the issue of adaptive distributed estimation.
Based on joint work with Botond Szabo.