The number of regulatory genes in bacterial genomes is proportional to the square of the total number of genes. As a consequence of this trend the fraction of regulators among all genes (the so-called "regulatory overhead") is less than 0.5% in small (< 500 genes) genomes, while in large genomes (~10,000 genes) it can be as high as 10%. The situation is reminiscent of the humorous Parkinson’s Law describing the rate at which government bureaucracies disproportionately expand over time. We recently proposed a general explanation of this quadratic scaling and illustrated it using a toolbox (or “Home Depot”) model in which bacterial genomes evolve by acquiring entire co-regulated pathways from a shared gene pool. Horizontal Gene Transfer assisted by bacteriophages plays the role of BitTorrent in this “gene sharing” network.
To continue the comparison between biological and technological systems we studied frequencies of gene occurrence in genomes of ∼500 bacterial species and compared it to frequencies of installation of software packages on over 2 million Linux computers. We found that in both cases frequency distributions are described by a similar U-shaped functional form with powerlaw scaling for small frequencies and an additional peak at the tail of the distribution corresponding to nearly universal components. I will derive a general mathematical expression for this distribution valid for any modular complex system. This derivation is limited to open source systems such as Linux/bacteria characterized by reuse and common sharing of previously developed components. In addition to genomes and large software projects we found similar properties in networks of citations between scientific publications, dependencies between mathematical theorems, and food webs in ecosystems.