Microbial Bioactives

Microbial Bioactives | Online ISSN 2209-2161
279
Citations
170.7k
Views
157
Articles
Your new experience awaits. Try the new design now and help us make it even better
Switch to the new experience
REVIEWS   (Open Access)

Illuminating Biological Dark Matter: Integrating Metagenomics, Synthetic Biology, and AI to Unlock Microbial and Genomic Potential for Therapeutics and Biotechnology

Abstract 1. Introduction 2. Materials and Methods 3. Results 4. Discussion 5. Limitations 6. Conclusion References

Yue Li 1, Shunqi Liu 2 *

+ Author Affiliations

Microbial Bioactives 9 (1) 1-8 https://doi.org/10.25163/microbbioacts.9110627

Submitted: 13 February 2026 Revised: 01 April 2026  Accepted: 08 April 2026  Published: 10 April 2026 


Abstract

Biological “dark matter,” encompassing uncultured microorganisms and poorly characterized regions of microbial and human genomes, represents a vast and largely untapped resource for therapeutic discovery and sustainable biotechnology. Traditional cultivation-based approaches have accessed only a small fraction of this diversity, limiting innovation in drug development, biomanufacturing, and precision medicine. Recent advances in metagenomics, synthetic biology, proteogenomics, and artificial intelligence (AI) offer powerful tools to overcome these constraints. This study presents a systematic review and quantitative meta-analysis conducted in accordance with PRISMA guidelines. Peer-reviewed literature published between 2000 and 2025 was retrieved from PubMed, Web of Science, Scopus, and Google Scholar. Eligible studies employed function-based, sequencing-based, or single-cell metagenomics; synthetic biology chassis systems; AI- or machine learning–driven bioprocess optimization; proteogenomic integration; HIV-1 genomic surveillance; or gut microbiota interventions. Effect sizes were extracted and synthesized using random-effects models, with heterogeneity, publication bias, and sensitivity analyses performed to ensure robustness. Meta-analytic synthesis revealed that metagenomic approaches significantly enhance the discovery of structurally novel and bioactive natural products compared with conventional methods. Engineered microbial chassis, particularly yeast and cyanobacteria, demonstrated consistent improvements in biomass accumulation and metabolite yield. AI-driven models achieved high predictive accuracy across bioprocessing applications, while proteogenomic integration revealed reproducible genotype–phenotype associations in cancer. Microbiome-based interventions showed moderate but consistent improvements in microbial diversity and metabolic function. Collectively, the findings demonstrate that integrating metagenomics, synthetic biology, proteogenomics, and AI enables scalable, reproducible, and functionally relevant exploration of biological dark matter. This convergence provides a robust framework for advancing therapeutic innovation, sustainable biotechnology, and precision medicine

Keywords: Metagenomics; Synthetic Biology; Artificial Intelligence; Microbial Dark Matter; Proteogenomics; HIV-1; Gut Microbiota; Natural Products

References

Alam, K., Abbasi, M. N., Hao, J., Zhang, Y., & Li, A. (2021). Strategies for natural products discovery from uncultured microorganisms. Molecules, 26(10), 2977. https://doi.org/10.3390/molecules26102977

Alexiev, I., & Dimitrova, R. (2025). The origins and genetic diversity of HIV-1: Evolutionary insights and global health perspectives. International Journal of Molecular Sciences, 26(22), 10909. https://doi.org/10.3390/ijms262210909

Alrashed, A. A. A. A., et al. (2018). Electro- and thermophysical properties of water-based nanofluids containing copper ferrite nanoparticles coated with silica: Experimental data, modeling through enhanced ANN and curve fitting. International Journal of Heat and Mass Transfer, 127, 139–150. https://doi.org/10.1016/j.ijheatmasstransfer.2018.07.123

Andrade Cruz, I., et al. (2022). Application of machine learning in anaerobic digestion: Perspectives and challenges. Bioresource Technology, 343, 126433. https://doi.org/10.1016/j.biortech.2021.126433

Ansari, F. A., et al. (2021). Artificial neural network and techno-economic estimation with algae-based tertiary wastewater treatment. Journal of Water Process Engineering, 39, 101761. https://doi.org/10.1016/j.jwpe.2020.101761

Ansari, F. A., Nasr, M., Rawat, I., & Bux, F. (2021). Artificial neural network and techno-economic estimation with algae-based tertiary wastewater treatment. Journal of Water Process Engineering, 40, Article 101761. https://doi.org/10.1016/j.jwpe.2020.101761     

Asnake Metekia, W., et al. (2022). Artificial intelligence-based approaches for modeling the effects of Spirulina growth mediums on total phenolic compounds. Saudi Journal of Biological Sciences, 29(2), 1053–1062. https://doi.org/10.1016/j.sjbs.2021.09.055

Bagherzadeh, F., et al. (2021). Comparative study on total nitrogen prediction in wastewater treatment plants and the effect of various feature selection methods on machine learning algorithm performance. Journal of Water Process Engineering, 43, 102033. https://doi.org/10.1016/j.jwpe.2021.102033

Banerjee, A., et al. (2016). Fertilizer-assisted optimal cultivation of microalgae using response surface methodology and genetic algorithm for biofuel feedstock. Energy, 115, 127–138. https://doi.org/10.1016/j.energy.2016.09.066

Bi, X., et al. (2019). Species identification and survival competition analysis of microalgae via hyperspectral microscopic images. Optik, 178, 238–246. https://doi.org/10.1016/j.ijleo.2018.09.077

Blank-Landeshammer, B., Richard, V. R., Mitsa, G., Marques, M., LeBlanc, A., Kollipara, L., … Borchers, C. H. (2019). Proteogenomics of colorectal cancer liver metastases: Complementing precision oncology with phenotypic data. Cancers, 11(12), 1907. https://doi.org/10.3390/cancers11121907

Camacho-Rodríguez, J., et al. (2015). Genetic algorithm for medium optimization of the microalga Nannochloropsis gaditana cultured for aquaculture. Bioresource Technology, 177, 102–109. https://doi.org/10.1016/j.biortech.2014.11.057

Cheah, W. Y., et al. (2018). Enhancing biomass and lipid production of microalgae in palm oil mill effluent using carbon and nutrient supplementation. Energy Conversion and Management, 164, 188–197. https://doi.org/10.1016/j.enconman.2018.02.094

Chen, C., et al. (2011). Cultivation, photobioreactor design, and harvesting of microalgae for biodiesel production: A critical review. Bioresource Technology, 102(1), 71–81. https://doi.org/10.1016/j.biortech.2010.06.159

Chong, J. W. R., Khoo, K. S., Chew, K. W., Ting, H. Y., Iwamoto, K., Ruan, R., Ma, Z., & Show, P. L. (2024). Artificial intelligence-driven microalgae autotrophic batch cultivation: A comparative study of machine and deep learning-based image classification models. Algal Research, 79, Article 103400. https://doi.org/10.1016/j.algal.2024.103400     

Dixon, T. A., & Pretorius, I. S. (2020). Drawing on the past to shape the future of synthetic yeast research. International Journal of Molecular Sciences, 21(19), 7156. https://doi.org/10.3390/ijms21197156

Felley-Bosco, E. (2023). Exploring the expression of the “dark matter” of the genome in mesothelioma for potentially predictive biomarkers for prognosis and immunotherapy. Cancers, 15(11), 2969. https://doi.org/10.3390/cancers15112969

Feng, Z., Chakraborty, D., Dewell, S. B., Reddy, B. V. B., & Brady, S. F. (2012). Environmental DNA-encoded antibiotics fasamycins A and B inhibit FabF in type II fatty acid biosynthesis. Journal of the American Chemical Society, 134(6), 2981–2987. https://doi.org/10.1021/ja207662w

Fu, J., Gao, Q., & Li, S. (2023). Application of intelligent medical sensing technology. Biosensors, 13(8), 812. https://doi.org/10.3390/bios13080812

Gillespie, D. E., Brady, S. F., Bettermann, A. D., Cianciotto, N. P., Liles, M. R., Rondon, M. R., … Handelsman, J. (2002). Isolation of antibiotics turbomycin A and B from a metagenomic library of soil microbial DNA. Applied and Environmental Microbiology, 68(9), 4301–4306. https://doi.org/10.1128/AEM.68.9.4301-4306.2002

Handelsman, J. (2004). Metagenomics: Application of genomics to uncultured microorganisms. Microbiology and Molecular Biology Reviews, 68(4), 669–685. https://doi.org/10.1128/MMBR.68.4.669-685.2004

Hildebrand, M., Waggoner, L. E., Liu, H., Sudek, S., Allen, S., Anderson, C., … Haygood, M. (2004). bryA: An unusual modular polyketide synthase gene from the uncultivated bacterial symbiont of the marine bryozoan Bugula neritina. Chemistry & Biology, 11(11), 1543–1552. https://doi.org/10.1016/j.chembiol.2004.08.018

Huang, K. L., Li, S., Mertins, P., Cao, S., Gunawardena, H. P., Ruggles, K. V., … Ding, L. (2017). Proteogenomic integration reveals therapeutic targets in breast cancer xenografts. Nature Communications, 8(1), 14864. https://doi.org/10.1038/ncomms14864

Imamoglu, E. (2024). Artificial intelligence and/or machine learning algorithms in microalgae bioprocesses. Bioengineering, 11(11), 1143. https://doi.org/10.3390/bioengineering11111143

Kavitha, S., Ravi, Y. K., Kumar, G., & Nandabalan, Y. K. (2024). Microalgal biorefineries: advancement in machine learning tools for sustainable biofuel production and value-added products recovery. Journal of Environmental Management, 353, 120135. https://doi.org/10.1016/j.jenvman.2024.120135

Liu, X., Tang, K., & Hu, J. (2024). Application of cyanobacteria as chassis cells in synthetic biology. Microorganisms, 12(7), 1375. https://doi.org/10.3390/microorganisms12071375

Luan, G., Zhang, S., & Lu, X. (2020). Engineering cyanobacteria chassis cells toward more efficient photosynthesis. Current Opinion in Biotechnology, 62, 1–6. https://doi.org/10.1016/j.copbio.2019.07.004

Mertins, P., Mani, D. R., Ruggles, K. V., Gillette, M. A., Clauser, K. R., Wang, P., … Carr, S. A. (2016). Proteogenomics connects somatic mutations to signalling in breast cancer. Nature, 534(7605), 55–62. https://doi.org/10.1038/nature18003

Onay, A. (2023). Theoretical models constructed by artificial intelligence algorithms for enhanced lipid production: Decision support tools. Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, 12(4), 1195–1211. https://doi.org/10.17798/bitlisfen.1362136       

Oruganti, R. K., Biji, A. P., Lanuyanger, T., Show, P. L., Sriariyanun, M., Upadhyayula, V. K., ... & Bhattacharyya, D. (2023). Artificial intelligence and machine learning tools for high-performance microalgal wastewater treatment and algal biorefinery: A critical review. Science of The Total Environment, 876, 162797. https://doi.org/10.1016/j.scitotenv.2023.162797       

Otálora, P., Guzmán, J. L., Acién, F. G., Berenguel, M., & Reul, A. (2023). An artificial intelligence approach for identification of microalgae cultures. New Biotechnology, 77, 58–67. https://doi.org/10.1016/j.nbt.2023.07.003          

Piel, J. (2002). A polyketide synthase–peptide synthetase gene cluster from an uncultured bacterial symbiont of Paederus beetles. Proceedings of the National Academy of Sciences, 99(22), 14002–14007. https://doi.org/10.1073/pnas.222481399

Quaranta, G., Guarnaccia, A., Fancello, G., Agrillo, C., Iannarelli, F., Sanguinetti, M., & Masucci, L. (2022). Fecal microbiota transplantation and other gut microbiota manipulation strategies. Microorganisms, 10(12), 2424. https://doi.org/10.3390/microorganisms10122424

Reddy, B. V. B., Milshteyn, A., Charlop-Powers, Z., & Brady, S. F. (2014). eSNaPD: A versatile, web-based bioinformatics platform for surveying and mining natural product biosynthetic diversity from metagenomes. Chemistry & Biology, 21(8), 1023–1033. https://doi.org/10.1016/j.chembiol.2014.06.007

Reimann, R., Zeng, B., Jakopec, M., Burdukiewicz, M., Petrick, I., Schierack, P., & Rödiger, S. (2020). Classification of dead and living microalgae Chlorella vulgaris by bioimage informatics and machine learning. Algal Research, 48, Article 101908. https://doi.org/10.1016/j.algal.2020.101908     

Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N. N., Anderson, I. J., Cheng, J. F., … Woyke, T. (2013). Insights into the phylogeny and coding potential of microbial dark matter. Nature, 499(7459), 431–437. https://doi.org/10.1038/nature12352

Sarkar, S., Manna, M. S., Bhowmick, T. K., & Gayen, K. (2020). Extraction of chlorophylls and carotenoids from dry and wet biomass of isolated Chlorella thermophila: Optimization of process parameters and modelling by artificial neural network. Process Biochemistry, 96, 58–72. https://doi.org/10.1016/j.procbio.2020.05.025    

Scott, T. A., & Piel, J. (2019). The hidden enzymology of bacterial natural product biosynthesis. Nature Reviews Chemistry, 3(7), 404–425. https://doi.org/10.1038/s41570-019-0107-1

Sonmez, M. E., Eczacioglu, N., Gumus, N. E., Aslan, M. F., Sabanci, K., & Asikkutlu, B. (2022). Convolutional neural network–support vector machine based approach for classification of cyanobacteria and chlorophyta microalgae groups. Algal Research, 61, Article 102568. https://doi.org/10.1016/j.algal.2021.102568        

Sultana, N., Hossain, S. M. Z., Abusaad, M., Alanbar, N., Senan, Y., & Razzak, S. A. (2022). Prediction of biodiesel production from microalgal oil using Bayesian optimization algorithm-based machine learning approaches. Fuel, 309, Article 122184. https://doi.org/10.1016/j.fuel.2021.122184      

Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., Eisen, J. A., … Nelson, W. (2004). Environmental genome shotgun sequencing of the Sargasso Sea. Science, 304(5667), 66–74. https://doi.org/10.1126/science.1093857

Wu, C., Shang, Z., Lemetre, C., Ternei, M. A., & Brady, S. F. (2019). Cadasides, calcium-dependent acidic lipopeptides from the soil metagenome that are active against multidrug-resistant bacteria. Journal of the American Chemical Society, 141(9), 3910–3919. https://doi.org/10.1021/jacs.8b12087

Yu, J., Liberton, M., Cliften, P. F., Head, R. D., Jacobs, J. M., Smith, R. D., … Pakrasi, H. B. (2015). Synechococcus elongatus UTEX 2973, a fast-growing cyanobacterial chassis for biosynthesis using light and CO2. Scientific Reports, 5(1), 8132. https://doi.org/10.1038/srep08132

Zhang, B., Wang, J., Wang, X., Zhu, J., Liu, Q., Shi, Z., … Tabb, D. L. (2014). Proteogenomic characterization of human colon and rectal cancer. Nature, 513(7518), 382–387. https://doi.org/10.1038/nature13438

 


Article metrics
View details
0
Downloads
0
Citations
212
Views

View Dimensions


View Plumx


View Altmetric



0
Save
0
Citation
212
View
0
Share