Computer Science Speakers Calendar

View Full Calendar

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts & Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics

Event Type
The Department of Computer Science, University of Illinois and FM/SE Seminar
0216 Siebel Center and Zoom
wifi event
Mar 22, 2024   2:00 pm  
Dylan Zhang, UIUC & Aakriti, UIUC
Isha Chaudhary

Talk 1 

Abstract: We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to improve instruction tuning, XFT introduces a shared expert mechanism with a novel routing weight normalization strategy into sparse upcycling, which significantly boosts instruction tuning. After finetuning the upcycled MoE model, XFT introduces a learnable model merging mechanism to compile the upcycled MoE back to a dense model, achieving upcycled MoE-level performance with only dense-model compute. By applying XFT to a 1.3B model, we create a new state-of-the-art small LLM (<3B) for code with 67.1 and 64.6 pass@1 on HumanEval and HumanEval+ respectively. Furthermore, with the same data and model architecture, XFT improves supervised fine-tuning (SFT) by 13% on HumanEval+, along with consistent improvements from 2% to 13% on MBPP+, MultiPLE, and DS-1000, demonstrating its generalizable effectiveness. XFT is fully orthogonal to existing techniques such as Evol-Instruct and OSS-INSTRUCT, opening a new dimension for improving code instruction tuning.


Talk 2

Abstract: The paper "Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics" discusses the optimization of preprocessing and inference processes in Deep Neural Network (DNN)-based visual analytics systems. It highlights that preprocessing, such as image decoding and resizing, can be a significant bottleneck in these systems, especially with the advancements in modern hardware accelerators. The authors introduce two main optimizations: leveraging low-resolution visual data for better accuracy and throughput trade-offs and an efficient runtime engine that pipelines preprocessing and DNN execution. They implement these optimizations in a system called SMOL, which shows up to 5.9 times end-to-end throughput improvements over recent visual analytics work without compromising accuracy. The paper emphasizes the importance of joint optimization of preprocessing and DNN execution to enhance the performance of visual analytics systems.


link for robots only