Microbiome research is the study of microbes in their theater of activity. It has many applications in medicine, biotechnology, crop sciences, and other areas. The analysis of microbiome sequencing data poses a wide range of computational challenges. While much early work has focused on taxonomic profiling based on 16S rRNA amplicon sequencing data, many projects now apply shotgun sequencing to collect more detailed information from samples.
A typical microbiome sequencing project may involve a few billion short reads obtained by a Illumina HiSeq3000, say. How to perform a detailed computational analysis of such a large dataset in an reasonable amount of time on a modest server? A key question is which genes are present in a sample. A comprehensive answer can be obtained by aligning all reads against the NCBI-nr protein reference database, which is possible using the high-throughput alignment tool DIAMOND (Buchfink, Xie and Huson, 2015). A program called Meganizer can be applied to DIAMOND's output files so as perform taxonomic and functional binning of all reads, and to index all results. The resulting meganized DIAMOND files can then be opened and explored in MEGAN, an interactive microbiome sequence analysis program (Huson et al, 2007, 2011, 2016).
There is increasing interest in sequencing microbiome samples using long read technologies as provided by Oxford Nanopore or PacBio. Microbiome analysis tools designed for the analysis of short reads need to be adapted to long reads. We have developed a number of new algorithms for long read (and contig) analysis that are implemented in MEGAN and we refer to these extensions as MEGAN-LR (CAMDA 2017, Huson et al, 2018). Similarly, the DIAMOND alignment tool has recently been extended to support the alignment of long and error-prone reads. We will discuss the details of some of the new algorithms and will provide examples of their applications.