Abstract: Sequencing across species and across individuals is proceeding at an extremely rapid pace, and the resulting explosion of genomic data is becoming difficult to manage. As the field moves from a data-poor to a data-rich science, many statistical and algorithmic approaches in bioinformatics need to be redesigned in order to scale with the data. At the same time, breakthroughs in computer system design can be achieved by considering how features of big biological data, and especially genomic sequence data, can inspire new computational platforms.
At Illinois, we have begun to explore this relationship by bringing together nearly 50 faculty including genomic biologists, statisticians, bioinformatics specialists, computer systems designers, leaders in data mining and information retrieval, visualization experts, and algorithms researchers. From this larger group, smaller interdisciplinary groups have begun to form. At the center of this effort is a four-year, NSF-funded effort to develop an instrument (CompGen) that adopts a hardware-software co-design approach, driven by the research of the group as a whole. We believe that such an interdisciplinary approach coupled with machine building will lead to major advances in scientific understanding through a combination of improved statistical models, algorithmic advances, efficient parallel computation, innovative hardware extensions, and genomic data management.
The CompGen instrument will provide a playground in which researchers can manage and process genomic information and while pursuing algorithm development and integration with system design. The instrument will incorporate emerging computer technologies such as die-stacked and non-volatile memory technologies as well as accelerators (GPUs, FPGAs, APUs). Instrument development will focus on reduction of data volume, optimization of storage hierarchy, identification and implementation of computational primitives, data visualization, mathematical toolkit optimization, and performance and reliability assessment. These developments will lead to new computational structures and hardware/software architectures that can be incorporated into hierarchical databases as well as heterogeneous processors for data analysis, compression, and optimization.
In this talk, I will discuss some of the goals and directions that we are pursuing in the broader group at Illinois, give some preliminary results from projects we have started around this effort, and discuss some possible connections with interests at ADSC.
Biography: Steve Lumetta is an Associate Professor of Electrical and Computer Engineering, an Affiliate Associate Professor of Computer Science, and a Research Associate Professor in the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign. He is leading UIUC’s effort to develop a next-generation computation genomics platform, and is one of the leaders in the broader effort to develop high-impact interdisciplinary research as well as industrial collaborations (see compgen.illinois.edu). Prof. Lumetta’s earlier research work includes high-performance networking and computing, computer system architecture, digital system testing, and optical network architecture. Prof. Lumetta holds an AB in Physics, an MS in Computer Science, and a PhD in Computer Science, all from the University of California at Berkeley.