Abstract: Doing data science – extracting insight by analyzing data – is not easy. Data science is used to answer interesting questions that typically involve multiple diverse data sources, many different types of analysis, and often, large and messy data volumes. To answer one of these questions, several types of expertise may be needed to understand the context and domain being served, to import and transform individual data sets, to implement effective machine learning and/or statistical methods, to design and program applications and interfaces to extract and share data and insights, and to manage the data and systems used for analysis and storage.
The IBM Research Accelerated Discovery Lab studies how data scientists work, and uses the results to help them gain insights faster. In this talk, I will look at what has been learned to date, through user studies and experience with tens of analytics projects, and the environment that was built as a result. In particular, I will describe how our system captures information to enable contextual search, provenance queries, and other functionality to afford teams faster progress in data-intensive investigations. I will also touch on efforts to leverage data and people to explain what happens during an investigation, with an ultimate goal of moving from descriptive to prescriptive analytics in order to accelerate data science and the analytic process. I will illustrate these various efforts using an ambitious current project on applying metagenomics to food safety, and will conclude with a discussion of where more work is needed to accelerate – and perhaps eventually, automate – data science.
Bio: Prior to joining UMass, where she is the Dean of Computer & Information Science, Dr. Haas spent 36 years at IBM, where she was rose to the level of IBM Fellow. Within IBM, she most recently served as Director of the Accelerated Discovery Lab (2011-2017); she was Director of Computer Science at IBM's Almaden research center from 2005 to 2011, and had worldwide responsibility for IBM Research's exploratory science program from 2009 through 2013. From 2001-2005, she led the Information Integration Solutions architecture and development teams in IBM's Software Group. Before that, Dr. Haas was a research staff member and manager at IBM Research - Almaden. She spent a sabbatical year at University of Wisconsin, Madison in 1992-3, and a shorter sabbatical at ETH Zurich in 2009. A.B., Harvard, 1978; PhD, University of Texas Austin, 1981.
At IBM, Dr. Haas received several IBM awards for Outstanding Innovation and Technical Achievement and an IBM Corporate Award for information integration technology, and was named an IBM Fellow. She is a recipient of the Anita Borg Institute Technical Leadership Award, and the ACM SIGMOD Codd Innovation Award. Dr. Haas was Vice President of the VLDB Endowment Board of Trustees from 2004-2009, and served on the board of the Computing Research Association from 2007-2016 (vice chair 2009-2015); she currently serves on the National Academies Computer Science and Telecommunications Board (2013-2019). She is an ACM Fellow, a member of the National Academy of Engineering, and a Fellow of the American Academy of Arts and Sciences.