In this talk I will address theoretical questions related to the task of data clustering (unsupervised learning) and in particular about a concrete and popular methodology known as spectral clustering. In spectral clustering the idea is to first use the spectrum of a graph Laplacian associated to a point cloud to construct an embedding of the cloud into some Euclidean space; after the embedding step, an algorithm like k-means is used to obtain the desired clusters. Despite the popularity of the method and its intuitive understanding by practitioners, only very few rigorous mathematical results aiming to justify its use are available. During my talk I intend to give answers to the following theoretical questions: What is the geometry of these graph Laplacian embeddings as the number of data points goes to infinity, and what is special about them that makes spectral clustering a successful methodology? I will also discuss some of the computational consequences of the theoretical results that I will present.
A variety of mathematical tools from optimal transport, spectral geometry, meta stability, and probability makes the analysis possible. The talk is based on joint work with Bamdad Hosseini (Caltech) and Franca Hoffman (Caltech).