Sparse Canonical Correlation Analysis

Canonical correlation analysis (CCA) is a technique for exploring ‘multi-view’ data, i.e. where one has two or more types of measurements concerning the same individuals; a biological example would be in multi-Omics studies where one might have data concerning abundances of different proteins and different metabolites. CCA aims to find directions of maximal correlation between the two sets of data. This task is straightforward in low dimensions but breaks when we have more dimensions than data points. Many methods have been proposed to deal with this problem by assuming the solutions are ‘simple’ in some sense; we are considering a new notion of ‘simple’ which is different to and complementary to the existing notions. However, it is not clear how to compare the different methods so we are seeking robust ways to choose between methods in practice.

Who's involved

Software