Speaker: Grigory Yaroslavtsev of Indiana University, Bloomington
Title: Advances in Hierarchical Clustering of Vector Data
Abstract: Compared to the highly successful flat clustering (e.g. k-means), despite its important role and applications in data analysis, hierarchical clustering has been lacking in rigorous algorithmic studies until late due to absence of rigorous objectives. Since 2016, a sequence of works has emerged and gave novel algorithms for this problem in the general metric setting. This was enabled by a breakthrough by Dasgupta, who introduced a formal objective into the study of hierarchical clustering.
In this talk I will give an overview of our recent progress on models and scalable algorithms for hierarchical clustering applicable specifically to high-dimensional vector data, including embedding vectors arising from deep learning. I will first discuss various linkage-based algorithms (single-linkage, average-linkage) and their formal properties with respect to various objectives. I will then introduce a new projection-based approximation algorithm for vector data. The talk will be self-contained and doesn’t assume prior knowledge of clustering methods.
Based on joint works with Vadapalli (ICML’18) and Charikar, Chatziafratis and Niazadeh (AISTATS’19)
Bio: Grigory Yaroslavtsev (http://grigory.us) is an assistant professor of Computer Science at Indiana University and an adjunct assistant professor of Statistics (by courtesy). He is the founding director of the Center for Algorithms and Machine Learning at IU (http://caml.indiana.edu/). Previously Grigory held postdoctoral fellowships at the Warren Center for Network and Data Sciences at the University of Pennsylvania and at Brown University, ICERM. Grigory received his Ph.D. in theoretical computer science in 2014 from Penn State. He works on foundational questions in scalable algorithms for machine learning, data science and private data release. His work is supported by NSF CRII Award and Facebook Faculty Research Award.