Recording available to view at: https://mediaspace.illinois.edu/media/t/1_vtn1quld
We are at an inflection point where software engineering meets the data-centric world of big data, machine learning, and artificial intelligence. As software development gradually shifts to the development of data analytics with AI and ML technologies, existing software engineering techniques must be re-imagined to provide the productivity gains that developers desire. We conducted a large scale study of almost 800 professional data scientists in the software industry to investigate what a data scientist is, what data scientists do, and what challenges they face. This study has found that ensuring correctness is a huge problem in data analytics. We argue for re-targeting software engineering research to address new challenges in the era of data-centric software development. We showcase a few examples of my group's research on debugging and testing of data-intensive applications: e.g., data provenance, symbolic-execution based test generation, and fuzz testing in Apache Spark. We then conclude with open problems in software engineering to meet the needs of AI and ML workforce.
Miryung Kim is a Full Professor in the Department of Computer Science at the University of California, Los Angeles. She is known for her research on code clones---code duplication detection, management, and removal solutions. Recently, she has taken a leadership role in defining the emerging area of software engineering for data science. She received her B.S. in Computer Science from Korea Advanced Institute of Science and Technology and her M.S. and Ph.D. in Computer Science and Engineering from the University of Washington. She received various awards including an NSF CAREER award, Google Faculty Research Award, Okawa Foundation Research Award, and Alexander von Humboldt Foundation Fellowship. She was previously an assistant professor at the University of Texas at Austin and also spent time as a visiting researcher at Microsoft Research. She is the lead organizer of a Dagstuhl Seminar on SE4ML---Software Engineering for AI-ML based Systems. She is a Keynote Speaker at ASE 2019, a Program Co-Chair of ESEC/FSE 2022, and an Associate Editor of IEEE Transactions on Software Engineering.
Part of the Illinois Computer Science Speakers Series. Faculty Host: Reyhan Jabbarvand.