National Center for Supercomputing Applications WordPress Master Calendar

View Full Calendar

NCSA staff who would like to submit an item for the calendar can email newsdesk@ncsa.illinois.edu.

Generating Natural and Effective Unit Test Cases with Large Language Models

Event Type
Seminar/Symposium
Sponsor
PL/FM/SE
Virtual
wifi event
Date
Nov 8, 2024   2:00 - 3:00 pm  
Speaker
Rangeet Pan, IBM T.J. Watson Research Center, Yorktown Height
Contact
Kristin Irle
E-Mail
kirle@illinois.edu
Views
49
Originating Calendar
Siebel School Speakers Calendar

Abstract: Implementing automated unit tests is an important but time-consuming activity in software development. To support developers in this task, software engineering research over the past few decades has developed many techniques for automating unit test generation. However, despite this effort, usable tools exist for very few programming languages. Moreover, studies have found that automatically generated tests suffer poor readability and often do not resemble developer-written tests. In this work, we present a rigorous investigation of how large language models (LLMs) can help bridge the gap. We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases. We illustrate how the pipeline can be applied to different programming languages, specifically Java and Python, and to complex software requiring environment mocking. We conducted an empirical study to assess the quality of the generated tests in terms of code coverage and test naturalness---evaluating them on standard as well as enterprise Java applications and a large Python benchmark. Our results demonstrate that LLM-based test generation, when guided by static analysis, can be competitive with, and even outperform, state-of-the-art test-generation techniques in coverage achieved while also producing considerably more natural test cases that developers find easy to read and understand. We also present the results of a user study, conducted with 161 professional developers, that highlights the naturalness characteristics of the tests generated by our approach.

 

Bio: Rangeet Pan is a research staff member at the IBM T.J. Watson Research Center, Yorktown Heights. His research interests are in the field of software engineering focusing on large language models and other machine learning techniques.

link for robots only