NCSA staff who would like to submit an item for the calendar can email newsdesk@ncsa.illinois.edu.
Talk 1
Abstract: We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to improve instruction tuning, XFT introduces a shared expert mechanism with a novel routing weight normalization strategy into sparse upcycling, which significantly boosts instruction tuning. After finetuning the upcycled MoE model, XFT introduces a learnable model merging mechanism to compile the upcycled MoE back to a dense model, achieving upcycled MoE-level performance with only dense-model compute. By applying XFT to a 1.3B model, we create a new state-of-the-art small LLM (<3B) for code with 67.1 and 64.6 pass@1 on HumanEval and HumanEval+ respectively. Furthermore, with the same data and model architecture, XFT improves supervised fine-tuning (SFT) by 13% on HumanEval+, along with consistent improvements from 2% to 13% on MBPP+, MultiPLE, and DS-1000, demonstrating its generalizable effectiveness. XFT is fully orthogonal to existing techniques such as Evol-Instruct and OSS-INSTRUCT, opening a new dimension for improving code instruction tuning.
References: https://yifeng-ding.com/files/XFT_preprint_1.pdf
Talk 2
Abstract: The paper "Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics" discusses the optimization of preprocessing and inference processes in Deep Neural Network (DNN)-based visual analytics systems. It highlights that preprocessing, such as image decoding and resizing, can be a significant bottleneck in these systems, especially with the advancements in modern hardware accelerators. The authors introduce two main optimizations: leveraging low-resolution visual data for better accuracy and throughput trade-offs and an efficient runtime engine that pipelines preprocessing and DNN execution. They implement these optimizations in a system called SMOL, which shows up to 5.9 times end-to-end throughput improvements over recent visual analytics work without compromising accuracy. The paper emphasizes the importance of joint optimization of preprocessing and DNN execution to enhance the performance of visual analytics systems.
References: https://arxiv.org/pdf/2007.13005.pdf