Many statistical procedures are based on ideas of the aggregation of information gleaned from summaries of a data set such as subsamples, bootstrap samples or random projections. Although the bootstrap itself is probably the best known such method, there are many other examples, including bagging [1,2,3] for regression or classification (with random forests  as a special case), Stability Selection for variable selection [5,6] and random projection ensemble classification . Intuitively, such procedures allow the statistician to understand the stability of observed effects under perturbations of the original data, and appear to be particularly valuable for complex, high-dimensional data. Even though these methods are typically embarrassingly parallelisable, they may nevertheless be computationally intensive.
This project will explore when and why methods such as these can be expected to succeed. The analysis will combine both statistical perspectives and the inherent computational trade-offs. It is hoped that the analysis will suggest other statistical challenges where the aggregation of data summaries can prove an effective tool.