Join us from 2-3 p.m. in Beckman 3269 or on Zoom for QCB Seminar featuring Bernhard Palsson, University of California, San Diego. Refreshments prior outside Beckman 3151.
Title: What are iModulons?
Abstract:
The first microbial genome sequences appeared in the mid to late 1990s. In the 2000s, computational biology at the genome-scale arose through the reconstruction of metabolic networks based on functional gene annotation. In the late 2000s, the cost of DNA sequencing dropped massively, leading to rapidly expanding data bases of microbial genome sequences and microbial transcriptomes. These data sets could be knowledge-enriched and decomposed into coherently functioning sets of genes using machine learning methods. A growing number of data types can be processed in a similar fashion. Multiple data types can now be made interoperable based on known mechanisms and molecular functions. The 2020s are likely to see an accelerating fine-grained understanding of microbial physiology.
Analysis of large biological data sets can take place at four levels. At level 1 we perform multi-variate statistics, at level 2 knowledge-enrichment of large data sets, at level 3 systems biology and computational modeling, and at level 4 detailed biophysical modeling. Levels 1 and 4 are well developed in the literature. The history of genome-scale models, level 3, is about 20 years old with much progress made. Level 2 is the least developed and is focused on knowledge mapping and the use of machine learning and explanatory AI.
This talk will focus on progress at levels 2 with transcriptomes. Large compendia of high-quality RNAseq profiles can now be decomposed using Independent Component Analysis (ICA). ICA identifies independently modulated sets of genes, called iModulons. This talk will show the uses of iModulons for metabolic engineering and bioprocess development: including cross-species transfer of iModulons, Media composition, expression of heterologous genes, and y-gene discovery.