Siebel School Speakers Calendar

View Full Calendar

CS Compiler Seminar: Pavlo Pastaryev, "Copy-and-patch compilation: a fast compilation algorithm for high-level languages and bytecode" and Srinjoy Das, "Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning"

Event Type

Seminar/Symposium

Sponsor

Computer Science

Location

2124 Siebel Center

Virtual

Join online

Date

Feb 20, 2023 4:30 pm

Views

103

We look forward to seeing you in person or virtually on Monday, February 20, at 4:30pm. Join in person in 2124 Siebel Center for Computer Science, 201 N. Goodwin Ave or via zoom, https://illinois.zoom.us/j/83675834345?pwd=T1l6aXdzK3lOdnNmVUtjZjFzdHZsdz09.

Speaker(s): Pavlo Pastaryev (Student Speaker)

Title: Copy-and-patch compilation: a fast compilation algorithm for high-level languages and bytecode

Conference: OOPSLA ’21 (Proceedings of the ACM on Programming Languages, Volume 5, Issue OOPSLA)

Author(s): Haoran Xu, Fredrik Kjolstad

Note: The following talk is a student presentation and not by the authors of the paper(s) being presented.

Abstract: Fast compilation is important when compilation occurs at runtime, such as query compilers in modern database systems and WebAssembly virtual machines in modern browsers. We present copy-and-patch, an extremely fast compilation technique that also produces good quality code. It is capable of lowering both high-level languages and low-level bytecode programs to binary code, by stitching together code from a large library of binary implementation variants. We call these binary implementations stencils because they have holes where missing values must be inserted during code generation. We show how to construct a stencil library and describe the copy-and-patch algorithm that generates optimized binary code.

We demonstrate two use cases of copy-and-patch: a compiler for a high-level C-like language intended for metaprogramming and a compiler for WebAssembly. Our high-level language compiler has negligible compilation cost: it produces code from an AST in less time than it takes to construct the AST. We have implemented an SQL database query compiler on top of this metaprogramming system and show that on TPC-H database benchmarks, copy-and-patch generates code two orders of magnitude faster than LLVM -O0 and three orders of magnitude faster than higher optimization levels. The generated code runs an order of magnitude faster than interpretation and 14% faster than LLVM -O0. Our WebAssembly compiler generates code 4.9X-6.5X faster than Liftoff, the WebAssembly baseline compiler in Google Chrome. The generated code also outperforms Liftoff's by 39%-63% on the Coremark and PolyBenchC WebAssembly benchmarks.

Speaker(s): Srinjoy Das (Student Speaker)

Title: Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Conference: OSDI ‘22

Author(s): Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

Note: The following talk is a student presentation and not by the authors of the paper(s) being presented.

Abstract: Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations. They do not suffice to scale out complex DL models on distributed compute devices. Alpa distributes the training of large DL models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive efficient parallel execution plans at each parallelism level. Alpa implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Our evaluation shows Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans.

link for robots only