Speaker: Chamika Sudusinghe
Title: COGNATE: Acceleration of Sparse Tensor Programs on Emerging Hardware using Transfer Learning
Abstract: Sparse tensor programs are essential in deep learning and graph analytics, driving the need for optimized processing. To meet this demand, specialized hardware accelerators are being developed. Optimizing these programs for accelerators is challenging for two reasons: program performance is highly sensitive to variations in sparse inputs, and early-stage accelerators rely on expensive simulators. Therefore, ML-based cost models used for optimizing such programs on general-purpose hardware are often ineffective for early-stage accelerators, as they require large datasets for proper training. To this end, we introduce COGNATE, a novel framework that leverages inexpensive data samples from general-purpose hardware (e.g., CPUs) to train cost models, followed by few-shot fine-tuning on emerging hardware. COGNATE exploits the homogeneity of input features across hardware platforms while effectively mitigating heterogeneity, enabling cost model training with just 5% of the data samples needed by accelerator-specific models to achieve comparable performance. We conduct extensive experiments to demonstrate that COGNATE outperforms existing techniques, achieving average speedups of 1.47× (up to 5.46×) for SpMM and 1.39× (up to 4.22×) for SDDMM.
Speaker: Bhavya Prakash Hirani
Title: VFlatten: Selective Value-Object Flattening using Hybrid Static and Dynamic Analysis
Abstract: Object flattening is a non-trivial optimization that inlines the fields of an object inside its containers. Owing to its direct applicability for immutable objects, Java would soon allow programmers to mark compatible classes as "value types", and Java Virtual Machines (JVMs) to transparently flatten their instances (value objects). Expectations include reduced memory footprint, faster field access, and overall improvement in performance. This paper describes the surprises and challenges we faced while experimenting with value types and object flattening on a real-world JVM, and presents the design of an efficient strategy that selectively flattens profitable value objects, using a novel combination of static and dynamic analyses. Our value-object flattening strategy is based on insights that span the source program, the just-in-time (JIT) compiler employed by the JVM, as well as the underlying hardware. The first insight identifies source-level usage patterns that favour and oppose value-object flattening. The second insight finds an interesting dependence of object flattening on object scalarization, and estimates the capability of the JIT compiler in avoiding overheads using escape analysis. Finally, the third insight correlates container objects with the cache-line size, based on the load semantics of object fields. In order to develop an efficient strategy to flatten potentially profitable objects, we capture these insights in a tool called VFlatten that uses a novel combination of static and dynamic analyses and flattens value objects selectively in a production Java runtime.