Abstract :
[en] Accurate identification of transcription factor binding sites (TFBSs) is fundamental to understanding gene regulation. Traditional position weight matrix (PWM) motif-based methods scan genomes to predict potential TFBSs based on sequence similarity. However, these approaches struggle to detect degenerate or low-affinity sites, which are common in bacterial biosynthetic gene clusters (BGCs). BGCs, which encode proteins and enzymes for production, export, resistance and regulation of specialized bioactive compounds, are typically regulated by multiple transcription factors acting on weakly conserved binding sites. This complexity and regulatory specificity limit the effectiveness of standard motif-scanning tools, impeding efforts to activate silent cryptic clusters and discover novel natural products. To overcome these limitations, we developed COnditions for Microbial Metabolite Activated Transcription (COMMBAT), a scoring method designed to improve TFBS prediction in BGCs. COMMBAT integrates two complementary components: an interaction score, derived from PWM-based motif matching, and a target score, which incorporates both the genomic context (region score) and gene function (function score) within the transcriptional unit associated with the TFBS. These components are normalized and combined to generate a final COMMBAT score that more accurately reflects biological relevance, prioritizing TFBSs neighbouring promoter regions and regulating functionally important BGC genes (such as regulatory and core biosynthetic genes). Evaluations demonstrate that COMMBAT substantially outperforms sequence-only methods in identifying already experimentally validated TFBS, offering a powerful tool to accelerate the discovery of transcriptional elicitors of microbial natural product biosynthesis. The COMMBAT website is available at https://www.commbat.uliege.be
Scopus citations®
without self-citations
0