Abstract: Classical multidimensional scaling is an important tool for data reduction in many applications. It takes in a distance matrix and outputs low-dimensional embedded samples such that the pairwise distances between the original data points can be preserved, when treating them as deterministic points. However, data are often noisy in practice. In such case, the quality of embedded samples produced by classical multidimensional scaling starts to break down, when either the ambient dimensionality or the noise variance gets larger. This motivates us to propose the modified multidimensional scaling procedure which applies a nonlinear shrinkage to the sample eigenvalues. The nonlinear transformation is determined by sample size, the ambient dimensionality, and moment of noise. As an application, we consider the problem of clustering high-dimensional noisy data. We show that modified multidimensional scaling followed by various clustering algorithms can achieve exact recovery, i.e., all the cluster labels can be recovered correctly with probability tending to one. Numerical simulations and two real data applications lend strong support to our proposed methodology.
Qiang Sun is currently an Assistant Professor of Statistics at University of Toronto within the Department of Statistical Sciences and Department of Computer and Mathematical Sciences. Previously, he worked at Princeton University as an associate research scholar. He earned his Ph.D. from the University of North Carolina Chapel Hill in 2014 and his B.S. from University of Science and Technology of China in 2010. His research interests span a broad spectrum, including hypothesis-driven imaging genetics, clustering, manifold learning, nonconvex optimization and robust statistics. He publishes papers in both statistics journals and applied journals, such as Journal of the American Statistical Association, Biometrika, The Annals of Statistics and Environmental Science and Technology. He has also been publishing in top machine learning venues, such as ICML and AISTATS.