The problem of variable clustering is that of grouping similar components of a p-dimensional vector X = (X_1 , … , X_p), and estimating these groups from n independent copies of X. Although K-means is a natural strategy for this problem, I will explain why it cannot lead to perfect cluster recovery. Then, I will introduce a correction that can be viewed as a penalized convex relaxation of K-means. The clusters estimated by this method are shown to recover the partition G at a minimax optimal cluster separation rate.
- Speaker: Nicolas Verzelen (INRA)
- Friday 09 June 2017, 15:00–16:00
- Venue: MR12, Centre for Mathematical Sciences, Wilberforce Road, Cambridge..
- Series: Statistics; organiser: Quentin Berthet.