Abstract: Modern compilers are still built using technology that existed decades ago. These include basic algorithms and techniques for lexing, parsing, data-flow analysis, data dependence analysis, vectorization, register allocation, instruction selection, and instruction scheduling. It is high time that we modernize our compiler toolchain. In this talk, I will show the path to the modernization of one important compiler technique -- vectorization. Vectorization was first introduced in the era of Cray vector processors during the 1980's. In modernizing vectorization, I will first show how to use new techniques that better target modern hardware. While vector supercomputers need large vectors, which are only available by parallelizing loops, modern SIMD instructions efficiently work on short vectors. Thus, in 2000, we introduced Superword Level Parallelism (SLP) based vectorization. SLP finds short vector instructions within basic blocks, and by loop unrolling we can convert vector parallelism to SLP. Next, I will show how we can take advantage of the power of modern computers for compilation, by using more accurate but expensive techniques to improve SLP vectorization. Due to the hardware resource constraints of the era, like many other compiler optimizations, SLP implementation was a greedy algorithm. In 2018, we introduced goSLP, which uses integer linear programming to find an optimal instruction packing strategy and achieves 7.58% geomean performance improvement over the LLVM’s SLP implementation on SPEC2017fp C/C++ programs. Finally, I will show how to truly modernize a compiler by automatically learning the necessary components of the compiler with Ithemal and Vemal. The optimality of goSLP is under LLVM’s simple per instruction additive cost model that fits within the Integer programming framework. However, the actual cost of execution in a modern out-of-order, pipelined, superscalar processor is much more complex. Manually building such cost models as well as manually developing compiler optimizations is costly, tedious, error-prone and is hard to keep up with the architectural changes. Ithemal is the first learnt cost model for predicting the throughput of x86 basic blocks. It not only significantly outperforms (more than halves the error) state-of-the-art analytical hand-written tools like llvm-mca, but also is learnt from data requiring minimal human effort. Vemal is a learnt policy for end-to-end vectorization as opposed to tuning heuristics, which outperforms LLVM’s SLP vectorizer. These data-driven techniques can help achieve state-of-the-art results while also reducing the development and maintenance burden of the compiler developer.
Saman P. Amarasinghe is a Professor and Associate Department Head in the Department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology and a member of its Computer Science and Artificial Intelligence Laboratory (CSAIL) where he leads the Commit compiler group. Under Saman's guidance, the Commit group developed the StreamIt, StreamJIT, PetaBricks, Halide, Simit, MILK, Cimple, TACO, GraphIt, Tiramisu, BioStream and Seq programming languages and compilers, DynamoRIO dynamic instrumentation system, Superword level parallelism for SIMD vectorization, Program Shepherding to protect programs against external attacks, the OpenTuner extendable autotuner, and the Kendo deterministic execution system. He was the co-leader of the Raw architecture project. His research interests are in discovering novel approaches to improve the performance of modern computer systems without unduly increasing the complexity faced by the end users, application developers, compiler writers, or computer architects. Saman was the founder of Determina Corporation, which was acquired by VMware, and a co-founder of Lanka Internet Services Ltd., and Venti Technologies Corporation. Saman received his BS in Electrical Engineering and Computer Science from Cornell University in 1988, and his MSEE and Ph.D. from Stanford University in 1990 and 1997, respectively.